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PREFACE TO FIRST EDITION 
OF VOLUME II 


This volume falls into five sections. The first, comprising chapters 17 to 20, deals 
th Estimation. The second, comprising chapters 21, 23, 24 and 26 to 28, covers the 
eory of Statistical Tests, including the Analysis of Variance and Multivariate Analysis. 
e third, consisting of chapter 22, deals with Regression Analysis and completes the 
ount of statistical relationship begun in chapters 13 to 16 of Volume I. In the fourth, 
apter 25, I have tried to give an introductory account of the reaction of theoretical 
isiderations on the Design of Statistical Inquiries. Finally, the fifth, comprising chapters 
and 30, deals with the Analysis of Time-Series. . 

The literature of statistical theory is now so vast that it seemed worth while devoting 

isiderable space to a bibliography, which is given in Appendix B. Although it is far 
m complete, I hope that it will serve its purpose in guiding the student to the main 
irces. 
The chief problem in the writing of this volume arose in connection with the logic of 
tistical inference. Whenever possible I have kept the treatment objective. It is, 
onsider, unfair in a book of this kind not to present all sides of a case, particularly when 
sre is so much disagreement among the authorities. Some day I hope to show that 
s disagreement is more apparent than real, and that all the existing theories of inference 
probability differ essentially only in matters of taste in the choice of postulates, But 
s book is not the place for such work, and for the present I am content to state the 
sition and to leave the reader to exercise his own choice. f 

The difficulty became most acute in dealing with confidence intervals and fiducial 
erence, where two approaches which at first sight appear identical can lead to different 
ults. Rather than try to reconcile them I have written a separate chapter on each. 
ofessor E. S. Pearson was kind enough to read the manuscript of chapter 19 and Professor 
A. Fisher that of chapter 20, so that I think their respective views are, at any rate, not 
srepresented. I am very grateful to them both for their help in this connection. 

My thanks are also due to Mr. P. A. Moran and Mr. A. J. H. Morrell, who cheerfully 


dertook to help with the proof reading and to whose painstaking scrutiny I owe the 
moval of a number of obscurities and errors. I shall be grateful to any reader who 
e evaded us. Once again I have also 


tects and notifies me of any further slips which hav [ | 
thank the publishers and the printers for the trouble they have taken in the production 


the finished work. M. G. K. 


INDON, 
April, 1946. 


PREFACE TO SECOND EDITION 
OF VOLUME II 


A few misprints have been corrected, but otherwise this edition is the same as its 
predecessor. The exhaustion of the first edition in little more than a year has been à 
very gratifying sign that the book is filling a need both at home and abroad. 

n M. G. K. 
LONDON, 
August, 1947. 
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CHAPTER 17 
ESTIMATION: LIKELIHOOD 


| The. Problem Ej 
17.1. On several occasions in previous chapters. we have encountered the problem 
of estimating from a sample the values of the parameters of the parent population. We 
have hitherto dealt on somewhat intuitive lines with such questions as arose—for example, 
in the theory of large samples we have taken the means and moments of the sample to be 
satisfactory estimates of the corresponding means and moments in the parent. 
We now proceed to study this branch of the subject in more detail. In the earlier 
part of the present chapter we shall.examine the sort of criteria which are required of 
a “good” estimate and discuss the question whether there exist “best” estimates in 
any acceptable sense of the term. In the remainder of the chapter and in Chapter 18 
9 swe shall consider various methods of obtaining estimates with the required. properties, 
—— m Chapters 19 and 20 we shall look at the same problem from a rather different point of 
view and discuss the theories of confidence intervals and fiducial limits. 
i 
17.2. It will be evident that if a sample is not random and nothing precise is known 
about the nature of the bias operating when it was chosen, very little can be inferred from 
it about the parent population. Certain conclusions of a trivial kind are sometimes pos- 
sible—for instance, if we take ten turnips from a pile of 100 and find that they weigh ten 
pounds altogether, the mean weight of turnips in the pile must be greater than one-tenth of 
a pound ; but such information is rarely of value, and estimation based on biassed samples 
remains very much a matter of individual opinion and cannot be reduced to exact ‘and 
objective terms. We shall therefore confine our attention to random samples only. Our 
general problem, in its simplest terms, is then to estimate the value of a parameter in the 
parent from the information given by the sample. In the first instance we consider 
the case when only one parameter is to be estimated. The case of several parameters 


will be discussed later. 


17.3. Let us in the first place consider what we mean by “estimation”. We know, 
or assume as a working hypothesis, that the parent population is distributed in a form 
which would be completely determinate if we knew the value of some parameter 0. We 
are given a sample of values zi . + . Vy We require to determine, with the aid of the 
z's, a number which can be taken to be the value of 0, or a range of numbers which can 
be taken to include that value. 

Now a single sample, considered by itself, may be rather improbable, and any estimate 
based on it may therefore differ considerably from the true value of 0. It appears, 
therefore, that we cannot expect to find any method of estimation which can be guaran- 
teed to give us a close estimate of 9 on every occasion and for every sample. We must 
content ourselves with formulating a rule which will give good results “in the long run” 
or “on the average”, or which has “a high probability of success ”—phrases which 
express the fundamental fact that we have to regard our method of estimation as generating 
y a population of estimates and to assess its merits aecording to the properties of this 
k population. 
A.S.—II 


| 
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17.4. It will clarify our ideas considerably if we draw a distinction between the 
method or rule of estimation, which, following Pitman, we shall call an Estimator, and the 
value to which it gives rise in particular cases, the Estimate. The distinction is the same 
as that, between a function f (z), regarded as defined for a range of the variable z, and the 
particular value which the function assumes, say f (a), for a specified value of x equal to a. 
Our problem is not to find estimates, but to find Estimators. We do not reject a method 
because it gives a bad result in a particular case (in the sense that the estimate differs 
materially from the true value). We should only reject it if it gave bad results in the long 
run, that is to say, if the population of possible values of the estimator were seriously 
discrepant with the value of 0. The merit of the estimator is judged by the population 
of estimates to which it gives rise. “It is itself a random variable and has a distribution 
to which we shall frequently have occasion to refer. 


17.5. In the theory of large samples we have often taken as an estimator of a para- 
meter # a statistic t calculated from the sample in exactly the same way as @ is calculated 
from the population, e.g. the sample-mean is taken as an estimate of the parent mean. 

' Let us examine how this procedure can be justified. Consider the case when the parent 
population is 
1 


> fae (— 4 (x — 0)?} dz, =o tL O. . (17.1) 
Requiring an estimator for the parent mean 0, we take 
er jn (17.2) 
The distribution of ¢ is 
p. vA 3 Lg x '9ys " ; 7 
Vx) exp { 5 (t — 0)*. dt, à ES 114.3) 


that is to say, t is distributed normally about 0 with variance 1/n. We notice two things 
about this distribution: (a) it has a mean (and median and mode) at the true value 0, 
and (b) as n increases, the scatter of possible values of t about 0 becomes smaller, so that 
the probability that a given ¢ differs by more than a fixed amount from 0 decreases; We 
may say that the accuracy of the estimator increases as n increases, or simply with n. 


17.6. Generally, it will be clear that the phrase “ accuracy increasing with n” has 

a definite meaning whenever the sampling distribution of t has a variance which decreases 
with 1/n and a central value which is either identical with f or differs from it by a quantity 
which also decreases with l/n. Many of the estimators with which we are commonly 
concerned are of this type, but there are exceptions. Consider, for example, the Cauchy 
population 
"T da: 
"x lw (z-0)y 
The mean (assuming that we conventionally agree that it exists) is at x = 0. But if we 
try to estimate 0 by the mean-statistic £ we have, for the distribution of t, 

apa dt 

7a DXg-p 
(C£ Example 10.1, vol. I, pp. 233-4.) In this case the distribution of t is the same 
as that of any single value of the sample, and does not increase in accuracy as n increases. 


dF =O S20 ., ^ + (17.4) 


=- oa Sco a é + (17.5) 


ey E~ 
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~Consistence 

17.7. The property of possessing increasing aceutacy is evidently a very desirable 
one; and indeed, if the variance of the sampling distribution decreases with increasing 
n it is necessary that its central value should tend to 9, for otherwise the estimator would 
have values differing systematically from the true value and would be useless, not to say 
dangerous. We therefore formulate our first criterion for a suitable estimator as follows :— 

An estimator f,, computed from a sample of n values, will be said to be a consistent 
estimator of 0 if, for any positive € and 7, however small, there is some N such that the 
probability that ’ 


` |t, —06|«e . E s : : (17:8) 
is greater than 1 — 7 for all n > N. In the notation of the theory of probaLility, 
P(H,—0|«se)21—m DEXNS 4 x (10220) 


The definition bears an obvious analogy to the definition of convergence in the mathe- 
maticalsense, Given any fixed small quantity e we can find a large enough sample number 
such that for all samples over that size the probability that t differs from the true value 
by more than e is as near zero as we please. t, is said to converge in probability to 0. Thus 
tis a consistent estimate of 0 if it converges to 0 in probability. à 


v duke 17.1 


The sample mean is a consistent estimator of the parameter 0 in the population (17.1). 
This we have already established in general argument, but more formally the proof would 
proceed as follows :— 

Suppose we are given e. From (17.3) we see that (t — 0) y/n is distributed normally 
about zero with unit variance. Thus the probability that | (¢ — 0) Vn| <en is the 
value of the normal integral between limits + eyn. Given any positive 7, we can 
always take n large enough for this quantity to be greater than 1 — 7 and it will continue 
to be so for any larger n. N may therefore be determined and the inequality (17.7) is 
satisfied. 


Example 17.2 

Suppose we have a statistic fj whose mean value differs from 0 by order n~1, whose 
variance v, is of order n`} and which tends to normality as mœ increases. Clearly 
(t, — 0)/A/v, will then tend to zero in probability and ¢, will be consistent. This covers 
a great many statistics encountered in practice. 


A Vithiassed Estimators 
17.8. The property of consistence is a limiting property, that is to say, it concerns 
the behaviour of an estimator as the sample number tends to infinity. It requires nothing 


of the behaviour for finite n, and if there exists one consistent estimator t, we may construct 
infinitely many others; e.g. 


n—a 
t 
n—b” 
js also consistent. We have seen that in some circumstances a consistent estimator of the 
mean is the sample mean 


Zu. 7 : 5 x » (17.8) 


aed: 
22> 
n 


4 ESTIMATION: LIKELIHOOD 


But so is 
3 y e (17.9) 
. = ——— . . . . . > f. 
s n—1 7 
Why do we prefer onè to the other ? Intuitively it seems absurd to divide the sum of 
n quantities by anything other than their number ^. We shall see in a moment, however, | 
that intuition is not a very reliable guide on such matters. There are reasons for preferring < 
: i n $ | 
——— (—z* . ` ; z + (17.10) 
c È | 
Lv | 
to uU tut m ge do, uen (17.11) 
2 


as an estimator of the parent variance, notwithstanding that the latter is the ‘sample 
variance. à; 


"17.9. Consider the sampling distribution of an estimator ¢ If the estimator is 
consistent, its distribution must, for large samples, have a central value in the neighbour- 
` hood of 6. We may choose among the field of consistent estimators by requiring that 
9 shall be equated to this central value not merely for large, but for all samples. Whether 
we choose as the appropriate central value the mean, the median or the mode is to some 
extent a matter of taste, We shall consider below what follows if we select the mode 
(which gives us the maximum likelihood estimators). For the present we discuss the mean. 
If we require that for all n the mean value of ¢ shall be 0, we define what is known as 

an unbiassed estimator : 


bias. We might equally well have chosen the mode as determining the “ unbiassed ” 
estimator, in which case the mean estimator would be “ biassed ” whenever it gave a dif- 
ferent result. Since the use of “ unbiassed " in connection with the mean is fairly wide- 
spread, however, we shall continue to use it.* 


Example 17.3 
Since E B ze = iD (E o) 


E n—1 PE! IA 
=z f = Z (z?) a 7 02), jzk 


= (n — 1) uj — (n — 1) u? 
= (n — 1) u,. 
* The word has already occurred in vol. I, p. 200, in this sense, It may be Spelt with either ono 
or two s's. My usage, I am afraid, is not consistent, but in this volume I use two. 
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n—1 
Hs. On the other hand, an unbiassed estimator 


Thus - Z (x — č)? has a mean value 


is given by 
1 


mmy 
sorea, 


and for this reason it is sometimes preferred to the sample variance. There are other 
reasons which will appear when we come to study the analysis of variance. 


Efficient. Estimators 

47.10. In general there will exist more than one consistent estimator of a parameter, 
even if we confine ourselves only to unbiassed estimators. Consider once again the esti- 
mation of the mean of a normal population with known variance. The sample mean is 
consistent and unbiassed. We will now prove that the same is true of the median. 

Consideration of symmetry is enough to show that the median is an unbiassed estimate 
of the' parent mean, which is, of course, the same as the parent median. For large » the 
distribution of the median tends to the normal form (cf. Example 9.7, vol. I, p. 213), 


dF c exp (—2mf? (vr —0)])dz  . d j . (17.13) 


where fi, is the median ordinate of the parent, in our present case 1/4/(2z) = 0:3989. The 
variance tends to zero and the estimator is consistent. Its variance is z/2m. 


17.11. We are therefore at liberty to seek for further criteria to choose between 
estimators with the common property of consistence. Such a criterion arises naturally 
if we consider the sampling variances of the estimators. Generally speaking, the estimator 
with the smaller variance will be grouped more closely round the value 0 ; this will certainly 
be so for distributions of the normal type. An estimator with a smaller variance will 
therefore deviate less, on the average, from the true value than one with a larger variance. 
Hence we may reasonably regard it as better or more efficient. 

Tf, of two consistent estimators t, and tą we have var ¢, < var tł, for all n, then ¢, is 
more efficient than t, for all sample sizes. It is possible to have var ¢, < var t, for some 
ranges of n and var, > var t, for others, in which case the estimators are more or less 
efficient in different ranges. 

In the case of mean and median we have, for any m, 


2 
var (mean) = — ; 3 : , » (17.14) 


and for large n L 
T ( ) = C 3 ( ) 
var (median) — In’ . . » (17.15 


where c? is the parent variance. Since 7/2 = 1-57 > 1 the mean is more efficient than 
the median for large n atleast. For small n we have to work out the variance of the median, 
The following values may be obtained from those given in Table XXIII of Tables for 
Statisticians and Biometricians, Part II :— 
n 2 3 4 5 
var (median) 1-00 1:35 1-19 1-44 

It appears that the mean is always more efficient than the median in estimating the para- 
meter 0 for the normal distribution (17.1). 
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Example 17.4 
For the Cauchy distribution 


1 dx 
ee Ear 
x m l + (x> 6)? 
we have already seen that the sample mean is not a consistent estimator. However, for 
the median in large samples we have, since the median ordinate is l/n, 


— 0 <LO 


2 
var (median) = =. 
n 


It is seen that the median is consistent, and although direct comparison with the mean 
ls not possible because the latter does not possess a sampling variance, the median is evi- 
dently a better estimator for 0 than the mean. This provides an interesting contrast with 
the case of the normal parent, particularly in view of the similarity of the parent frequency- 
distributions. 

ion In some cases, as we shall see below, there exist consistent estimators whose 
sampling variance for large samples is less than that of any other such estimator. We 
shall call such estimators most-efficient. When they exist they provide a standard of 
measurement of efficiency. In fact, if t, has variance v, and the most-eflicient estimator 
t, has variance v,, the efficiency E of t, is defined as 


v, 
B= 1 . . 


a EIE 417.10) 


Tt will be seen later that in normal samples the mean is a most-efficient estimator, so that 
the efficiency of the median for such samples is 
2n 


—-—. e 0-637. 
7T n 


17.13. If we have a sample of 100 members the variance of the median (assuming 
normality) will be abóut the same as that of the mean in only 64 members. Thus, if 
sampling variance be accepted as a criterion of accuracy of estimation, the use of the median 
instead of the mean sacrifices about 36 observations in 100. Itis not possible to economise 
by using a different estimator than the mean. 

Other things being equal, the estimator with the greater efficiency is undoubtedly 
the one to use. But sometimes other things are not equal. It may, and does, happen 
that a most-efficient estimate derived from tı is more troublesome to calculate than an 
alternative f, The extra labour involved in calculation may be greater than the saving 
in dealing with a smaller sample number, particularly if there are plenty of further 
observations to hand, 

Example 17.5 S. 

Consider the estimation of the standard deviation of a normal population with variance 
o° and unknown mean. Two possible estimators are the standard deviation of the sample 
(or the square-root of X (x — z)*/(n — 1) if it is desired to use an unbiassed estimator) 
and the mean deviation of the sample multiplied by V (5/2) (cf. 5.20). The latter is 
easier to calculate, as a rule, and if we have plenty of observations (as, for example, if we 
are finding the standard deviation of a set of barometric records and the addition of further 


pe— 
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members to the sample is merely a matter of turning up more records) it may be worth 
while estimating from the mean-deviation rather than from the standard deviation. 
In normal samples the variance of the mean-deviation is (9.13)— 
“2 n=1 - : 2 
u a (5 y {n(n — 2)} — n + are sin a sie t ( -£ s (11.8) 
ml n zt 


T mA 2 


The variance of the estimator from the mean deviation is then approximately 


o? (x —2 
— 2 . . . E s 17.18 
= (75>) (17.18) 
Now the variance of the standard deviation is (9.22) o?/2n, and we shall see later that it 
is a most-efficient estimator. Thus the efficiency of the first estimator is 


2 2 =F 1 
(ay pa eps — 0-876. - 
x] n 2 m—2 


The accuracy of the estimate from the mean deviation of a sample of 1000 is then about 
the same as that from the standard deviation of a sample of 876. If it is easier to calculate 
the m.d. of 1000 observations than the s.d. of 876 and there is no shortage of observations, 
it may be more convenient to use the former. 

Tt has to be remembered, nevertheless, that in adopting such a procedure we are 
deliberately wasting information. By taking greater pains we could improve the efficiency 
of our estimate from 0-876 to unity, or by about 14 per cent. of the former value. 


Sufficient Estimators 

17.14. The comparison of the efficiencies of two estimators, as measured by their 
variances, may be made for any n, but the absolute efficiency as defined in 17.12 by relation 
to a most-efficient estimator is in the main a limiting property. We shall see below (17.36) 
that the definition may be extended to small samples and to non-normal variation, but 
most-efficient estimators for finite n do not exist so frequently in statistical practice 
as in the limiting case of large samples. Sometimes, however, there are estimators which 
may be regarded as the ' best" for samples of any size, and we proceed to consider 


them. 


Before doing so, we prove that, in the limit, all most-efficient estimators tend to 


equivalence. 


More precisely, if two most-efficient estimators £, and t, tend in the limit to be dis- 
tributed in the bivariate form 


dF cc exp E spp {(t, — 0) — 2p (h — 0) (ta — 8) + (& — 95] di, dí, . (17.19) 
then the correlation p — 1. Here v is the variance of each estimator. 
Consider the estimator. 
u, = 3 (tı + 6). 
Clearly w, is consistent since /, and f, are both so. Putting 
, k us = $ (tr — t) 
we have, for the: joint distribution of u, and us, 


dF c exp E z0 A @ (1 —p) (um 0-20 oii] du, du, . (17.20) 
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Thus uw, is distributed independently of u; and 6 and we have 


s * . . . (173 
Bcc) z? (17.21) 
Now 1, is a most-efficient estimator and hence 
1 
EP o = var u > varty=v 
giving ER Sh (17.22) 


But p cannot be greater than unity and hence p = 1, which proves the theorem, 


17.15, Consider once again the estimation of 6 in 


the normal population (17.1). 
The joint distribution of the sample is given by 


dF = ma exp { — i» (a — oe} dz, .-.*. dz, : e (17.23) 
(22) j71 


We have the familiar result 


Dd, & - 6) - Iei) n (e — 0p, 
j-1 
and hence 
dF = - exp [-56 — or} exp(—$Z(r—i?)dz... dz, . (17.24) 
(22) 
Thus the frequency function of the distribution of z's (which is equivalent to the likelihood 
function) can be factorised into two parts, one depending on z and 0, the other depending 
on the 2’s but not on 6. 
The quantity @ is then said to be a sufficient estimator of 0; and generally, if the 
likelihood function is expressible in the form (as a product of two frequency functions)— 
L(n...2m,0) = Li (t, 0) La (2n... tà) : + (17.25) 
where L, does not contain the £'s otherwise than in the form ta 
t is said to be a sufficient estimator of 6. 


nd L, is independent of 6, 


17.16. As so defined, a sufficient estimator, 
if ¢ obeys the relation (17.25) any function of t will obviously also obey the same relation. 
From all such functions we must evidently choose one which gives a consistent, estimator 

, as in the example of the previous section, find the estimator which is 
unbiassed. Apart from such ambiguities, which offer no difficulties in practice, the property 
of uniqueness holds. For ift, and 1; were two different sufficient statistics, not functionally 


if it exists at all, is unique except that 


Dy, (ti, 0) Ly (a, 2 2, Ea) = M, (5,0) My (ay, a . ^5" 
and hence 
Lı (t0) M, 
: M, (4, 0) (ts, 0) Te E H ` : < (17.26) 
Since the expression on the right does not contain 6, L 
over the quotient must be a constant ; for if it were 
would have been assimilated to D, or M, 


ı must be a factor of M 1 and more- 
a function of the z’s that function 


C art © 
- eun editis a fle — 
oro om CA, ——————— 
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Hence 
L, (6,0) = k M; (ts, 0), 


and this cannot be so unless ¢, and f, are functionally related. 


17.17. The fundamental property of sufficient estimators derives from the following 
theorem :— 
If t, is sufficient and f; is any other estimator of 0 (not a function of ¢,) the joint dis- 
tribution of f, and t, may be put in the form 
dtes f 0) fe: 1a) dish E Nc tee) 
where f, does not contain 0. Conversely, if (17.27) holds for every t, then f, is sufficient. 
Before proving this result let us notice its importance. From (17.27) it follows that 
for any given t, the distribution of t, is equal to f, (ta, tı) dts, i.e. is independent of 0. Con- 
sequently, if we know #,, the probability of any range of values of t, is the same for all 0. 
The distribution of t, given ¢,, therefore, can throw no light whatever on 0. Thus, a know- 
ledge of t, gives all the information that the sample can supply about 0 and no other 
estimator can add anything to it. We are clearly justified in such circumstances in 
describing a sufficient estimator as the “best”. 
Now as to the theorem itself. The direct part is easily proved. In fact, we have from 
(17.25)— 
L (y+ «+ Ln 0) dEr ose . de, = Dy (tr, 0) De (zu «s Zn) da, a o o dos. 
Make the transformation 
Wea (zuo c) 
Ya — (tr > o. Bq) 
fs — 2. 109 oa ie o ERR C E1008) 


Yn = Tn 
The element of frequency becomes 


Ly (tr 0) La (ty, « + 2.) 9 (xi, 2) 


d (hs, te) 
where the t's and 2’s are to be expressed in terms of the y’s. We have excluded the case 
when t, is functionally related to ¢,, and hence the Jacobian ð (2,, a) /A(t,, ta) does not 
vanish identically. The frequency element of y; and y, is then obtained from (17.29) by 
integrating out the other variables. Since y; and y, are equal respectively to t; and t, 
this process will leave unchanged the function Z, (t, 9) and reduce the other part to a 
function of t, and ta say f; (ts ts). Writing f, for L, we then have 
dF ES (4, 0) fs (6, ta) dt, dts, 


dyna eS dy, o Vau. 29) 


as stated in the theorem. 
The converse is a little more difficult. Let t, be sufficient and make the transformation 


Yı =t» yim ete. The joint distribution of sample values becomes 
8t, 

Jes 

Since t, is independent of 0, so is 0/;/0x,. Hence, if the distribution of t, is f (tı) dt,, L' may 

be written 


bm... %) =D (ta cy . (17.30) 


OPAC ES ume i POR LTT 
and the converse will be established if we can show that L” does not contain 0. This we 
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do by demonstrating that if there are values y; . . . y, for which L" assumes different 
values for different values of 0 then the joint distribution of t, and ¢, cannot be independent 
of 0, which contradicts our hypothesis. 

Suppose, then, that for two values of 0, say 0, and 6,, 


T^ Ya, wg sys ys ms T > o (17.32) 
where « is not zero. Consider a new statistic ts defined by 


f DR (ur UU MEME a. (17.33) 


ba 
Assuming that L” is continuous in the y’s, we may determine a value of ts, say tz, such that 
E(t Yas + + Yndo, > L" (ty Yar » ys + % i e (17.34) 
everywhere inside the range of values bounded by 
t? = X (y — y')?. 
Then for any fixed t, the total frequency inside this range is obtained by integrating L” 
over the appropriate values, and we shall find, in virtue of (17.34), 


ir M UNS T Pa Cad . (17:5) 
the f's referring to total frequencies. 
But if the joint distribution of t, and f, is 


dF = h (t, t,), dt, dt, 
we have for the frequencies f, , 


to, he fe h (t, [3 dt, 


À 
fo, = fe h (ti, 13), dt, 
and hence 


i 
IN U (ty, t), — h (tas ta), } dt, > 0, 


so that the joint distribution cannot be independent of 6. 
The above demonstration relates to the case when the frequency functions are con- 


_ tinuous. In the discontinuous case the argument simplifies and we leave it to the reader 
to supply the proof. 


17.18. We now prove an important further result to the effect that a sufficient 
estimator is most-efficient, provided that a most-efficient estimator exists. We assume 
that the joint distribution of the sufficient estimator ¢, and any other estimator 1, tends 
to normality for large n, say in the form 


A 1 (t = 0)? — 2p (tı — 0) (t, — 0) , (t, — 0: 
dF œ exp [ 3 — 5j { 2 eee) + a }] dt, dt, . (17.36) 


where v, and v, are the variances of /, and t, respectively. Since t, is sufficient, the dis- 
tribution of t, given t, does not contain 6. Now the distribution of t, is 


dF œ exp f- H xd dij. €. : $ . (17.37) 
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and hence that of t, given t, is 


1 {=o 2p (h — 9) (t = 9) | (& — 

2 (1.— p°) V; V (vi va) Sy 
which reduces to j E 

dF c exp [ '! fe Gaa — a ga. oi (7.38) 


a nc ay dt, 


dF c exp | 


8ü—p vw ^ v^ 
If this is not to involve 0 we must have " 
p Ja = yE, where E is the efficiency of t. » (17.39) 
2 


Since p « 1 it follows that v, <Va i.e. t, has a smaller variance than any other estimator. 
Consequently, if there exists a most-efficient statistic, t, itself is most-efficient. 


17.19. The criterion of sufficiency is not a limiting property. A sufficient estimator 
is best for any sample size since it gives all the information about 0 that the sample can 
give; and it is most-efficient for large samples. If we could always find a sufficient 
estimator our problem would be solved, but unfortunately sufficiency is the exception 


rather than the rule. 


Example 17.6 
The frequency element of a sample of n from the population 


3j exp m M eum) da 


1 
e aie TE 


can be put in the form 


FY e- n (ž — m)? 
Papaa j^ 


n-1 
2 


ans 
e °” gn-8 dz ds? 


n 
i 


NITE 
g?) 2 r( 3 ) 

(Cf. Example 10.5, vol. I, p. 238.) 

If we know a, then, as we have already seen, Z is sufficient for m. But if we know 
m, 8 is not sufficient for c. In fact, the factorisation in the above equation requires the 
appearance of o in the element relating to @, and we cannot separate a factor containing 
s and c alone or the remaining variables alone. 

This is what we might expect. If we know the real mean m there is little point in 


preferring the sample variance 


1 
2 =- 8 (a — #)* 
3 n ( p 
to the second moment 
1 
/? == S (x—my 
8 3 (x — m) 


as an estimator of the parent variance. The distribution of s' is given by 


dF = —— emt (ayn? ds? 


(202 r(5) 
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and this embodies the whole of the frequency element of the sample, apart from differentials 
in the other variables. Thus s' is sufficient for o. 


17.20. This completes the first stage of our inquiry. The criteria of consistence, 
efficiency and sufficiency provide standards which we shall-look for in “ good ”’ estimators. 
Of themselves, however, they do not provide any systematic way of deriving estimators 
which obey them. We shall now consider various methods which have been proposed for 
providing estimators and examine how far they conform to our criteria. The most 
important method is that of maximum likelihood, which will occupy the remainder of this 
chapter. In the next chapter we shall consider four others, the method of minimum 
variance, the method of minimum 7?, the method of least squares, and the method of 
inverse probability. 


Maximum Likelihood 
17.21. If the frequency function of the parent population is f (x, 0), the likelihood 
function of a sample of n is, by definition, 
L=f (er 8) fen 0). . fms 0). e oo“ (17.40) 


The Principle of Maximum (or Maximal) Likelihood then states that if there exists a statistic 
t=t(%,, . . . 2a) which maximises L for variations of 0, then ¢ is to be taken as an 
estimator of 0. In short, t is the solution (if any) of 


oL L s 

ap ao? < ® eet eS 0741) 
Since L is positive, the first equation is equivalent to 

loL o Wes 

LH mel =o ; 3 : +» (17.42) 


a form which is frequently more convenient, 

There is one small point to notice here. In our usual convention, if a frequency 
function has a finite range, we regard it as defined from — co to + œ but as zero outside 
that range. In this chapter we shall occasionally meet the reciprocal of f, which is undefined 
for zero f. Unless the contrary is specified we shall suppose that where f is zero 1/f is also 
to be regarded as zero. This will enable us to continue to regard the range as infinite, but 
some care is necessary where f is assumed everywhere continuous, for discontinuities may 
appear in f and 1/f at the terminals of the finite range. The point becomes important 
when we try to make certain existence theorems rigorous, 


17.22. In sections 7.27 to 7.31 we touched on the principle of maximum likelihood 
from the point of view of statistical logic. We pointed out that its adoption required a 
new postulate in the theory of inference, but referred to the fact that the principle was 
recommended by the statistical properties of the estimators to which it leads. We now 
proceed to prove a series of theorems about these estimators, from which it will bé seen 
that the posterior recommendation, so to speak, is very strong. In fact, maximum 
likelihood estimators are consistent, tend to normality for large n, have minimum variance 
in the limit at least, and provide sufficient. statistics where such exist. 


17.23. "The reader may feel convinced intuitively that maximum likelihood estimators 


Ne -——— 
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are consistent, in which case he can pass to the next section. We shall now prove the 
result formally. 

(a) If the frequency function f (2, 0)-is continuous in x throughout its range, and 

(b) if f (x, 0) is'continuous-and monotonic in 0 in some 6-interval containing the true 

value of 0, say 9, and for all z in some z-interval, 
then the maximum likelihood estimator of 0, say t, is consistent. 

Our proof will also cover the case of discontinuous variates which can be reduced to 
the continuous tase by replacing each value by an interval in which the frequency is 
uniformly distributed. A 

We first eliminate an inconvenience due to the infinitude of the range. In fact, if the 
range is infinite we make the variate transformation z = tan y. The conditions (a) and (b) 
remain true of y, and the maximum likelihood estimator in x transforms to that in y. We 
may therefore take the range as finite. 

The next step-is to reduce the case to one of grouped frequencies by dividing the range 
into m intervals, the width of the jth interval being [;. (We shall decide on the actual 
values of the ls below.) Writing 


f fur 0) dz, è " A $ . (17.43) 


we have, in virtue of the continuity of f in x, that f;/l; differs as little as we please from 
f(x, 0). Then if L' is the likelihood of the grouped data, proportional to 


Queso E 


where», is the number of observations in the jth interval, we have, except for constants, 


log L’ = J nj logs -Y ny logl, . A . . (17.45) 
j-l > * 7=1 
and this will differ arbitrarily little from the logarithm of the true likelihood 
log E =Y logs Ep i 4. 0. (17.40) 
2 j=1 , 


provided that we take m large enough and the ls in consequence small enough. 

Hence we see that if t is the estimator which maximises Æ and ¢’ that which maximises 
L/, in virtue of hypothesis (b) that L and L’ are continuous in 0, ¢ and ¢’ will differ as little 
as we please for any given values of the 2's and that uniformly. We may therefore prove 
our theorem for the finite number of variables n; and infer its truth for the continuous 
case by proceeding to the limit. r 

In different samples the n; will vary, subject only to the condition that Z (n;) = n. 
Let us choose the ranges J, such that f, (0o) = 1/m for all j, that is to say, such that the 
frequencies in all intervals are equal when 6 takes its true value 0,. Consider the likelihood 


function 


m 


K= Ms n; log z;, ë $ E í . (17.47) 


7=1 
where the z’s are subject only to the condition 
TOSE sce ee wert tre, tae (LTS) 


14 ESTIMATION: LIKELIHOOD 


We consider three values of K defined by particular values of the z's. 
(a) When 2; = n;/n, K is a maximum, say Kp. For we have 


6K — ES bz, 
iZ, 
v cr 


and hence 


Bi Nee wide eae) 


(b) When z; = f; (0;) = 1/m, K is, say, Ky. 

(c) When the estimator t" assumes the value, say, t; corresponding to the »;'s, and 
hence 2; — n (t), K is a maximum, say Kz, among the particular set of values of @ for 
which z; = f; (6); for this is our definition of t'. 

We have at once that 

Kp > Kz > Ky. j 2 . 4 . (17.49) 


Now, as the sample increases, the observed nj/n converge in probability to their 
theoretical values Jı (90) = 1/m. Since K is continuous in the z's, Ky — K,, will converge 
to zero in probability and, from (17.49), so will K R— Kz. 

Now we show that this entails that each of 


145 (4) — f; (80) | 
converges to varo in probability. In fact, since | f (90) — d does so, it will be enough 
to prove that.the same holds for 
o=] NS Me! A (17,50) 
Let K, be the maximum of K for some fixed z,. Then Ky> K, and 
Kg — Ky >K, — Ky. 
Hence K, — K;, converges to zero. "The maximum K,is readily seen to be given by 


z — 7% (1 — 2;) 


j= 2... M E 5 7.51 
oe j= 2, sm + (17.51) 


m 
K, =n, log z, + (n — ni) {log (1 — z,) — log (n — 1,)) + 2,5 log n,;. (17.52) 
j=2 
Now 2, is a double-valued function of K,, continuous and having its two values equal 
for K, = Kp; for K, is continuous in 2, from 0 to 1 (not inclusive), and a changes sign 
Zi 


only for zı = n,/n, where K, = Kg. It follows that when K R — K, is small, so is 
zı — n/n. Tf the other z's are not given by (17.51) Kr — K is smaller still. 


A similar argument applies for any j, and hence g— z | converges to zero in proba- 


bility when Kj, — K does so. Taking 2; = f; (fj) and remembering that in this case K 
becomes Kz, we reach (17.50). 


Finally, by hypotheses (a) and (b) at least some of the f; (0) have continuous inverse 
functions expressing 0 in terms of the fünctions f, and hence by taking 


145 (60) — fy (89) | 


WP 
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as small as we please, we may make f, — 0, as small as we please. Consequently /' con- 
verges to 0, in probability and is consistent. 


17.24. The reader may find the foregoing proof easier to follow if we express its 
main points in geometrical terminology. _ 

Consider the m proportions n;/n as the co-ordinates of a point in a space of m 
dimensions. The theoretical frequencies 
fi (80) = 1/m define a point, say M, in 
this space, and the sample point R, cor- 
responding to an observed set of n;’s; may 
be regarded as varying round the “ theo 
retical” point M. The quantities z are 
the co-ordinates of any point in the hyper- 
plane 5 (z) = 1, which contains M and R. 
(See Figa 17.1.) 

Now, for any sample point R the 
maximum likelihood estimator ¢’ assumes 
a value t, which in general differs from 
0, This value defines m quantities f; (to) 
which determine a point Z. This also Fio. 17.1. 
lies in the hyperplane since the sum of 
the frequencies is unity. Thus the points E determine a set of points Z which all lie on 
the curve defined for variations in 0 by à 
4=f,(0) . : g gx ott . (17.53) 
Since 0 = 0, is a possible value of 0, the point M lies on this curve; R in general does 
not. 

What we have shown in analytical form is that the function K, which is the logarithm 
of a likelihood function defined for any point on the hyperplane, has a maximum at R 
and a maximum on the curve itself at Z. As the sample size increases, Æ is as near as 
we like to M (in the sense of convergence in probability, that is to say, that as high a pro- 
portion of points R as we like are as near as we like to M). This involves that Z also is as 
near as we like to M. This in turn involves that the parameter-value /; corresponding to 
Z is as close as we like to 0, for as high a proportion of the possible points Z as ‘we like, 
which is our theorem. 


17.25. We now prove a second fundamental property of maximum likelihood 
estimators, namely that they tend to normality for large n. More precisely, 

(a) If condition (a) at the beginning of 17 .23 is satisfied ; and if (more stringently 

than condition (b) of that section) (c) in a 0-interval containing the true value 4p, 


00 
tends to infinity, and E does not vanish in some interval, 


a > ; 
af is continuous in 0 for every v, afl approaches a continuous function of 0 as x 


then the maximum likelihood estimator £ tends to normality for large n. The condition 


as to ad ensures that in the transformation to finite range 20 remains continuous in 0 


"throughout that range. 
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We recall that if 


NL MM VUL. (17.88 
that is, if the é’s are the deviations of the actual proportional frequencies n y/n from the 
“expected ” frequencies 1/7, the distribution of the é’s in the limit will be normal and their 
distribution spherically symmetric. Consider again the orthogonal space of the previous 
section. The sample points are distributed about the point M in a symmetrical form which 
tends to normality. If we choose a set of orthogonal axes in the hyperplane, the projection 
of the sample points on any axis is in the limit distributed normally with variance 1/mn: 

Tn the neighbourhood of M the curve (17.53) approaches its tangent line as n becomes 
larger, and we therefore have, if s is the distance along the tangent from M, 


e -0 6 510) 


j=l 


2 
> 


. (17.55) 


as follows from (17.53). (The tangent exists in virtue.of our hypothesis as to the differential 
coefficients of f in 0.) : 

Now consider the point Z on the curve corresponding to the sample point R. We 
know that at Z the function 


K = Zn; log (s +=), . . . : +» (17.56) 


where we now measure z from Jf, is a maximum for variations in z such that Z lies on 
the curve. R is determined by finding the hypersurface (17.56) tangent to the hyper- 
plane Z(z) = 0, for at that point 0K /ðz; is zero. - We know that the co-ordinates of 
this point are z; = »;/n — 1/m and that È is the point of tangency. Kp as defined in 
17.23 is the value of K at R, and K; is that at Z. We then have, by Taylor's theorem, 


E oK OK 
* K.-K 1] 6 LI—— e C. «(07.57 
ts 2+ DG), BEPA (ean), LS Bed 
to the second order of small quantities in óz. From (17.56) we see that 
OK _ e 
Be, = (17.58) 
ak : 
= 5 jzk 
02; 02, x (17.59) 
Aer P =p) 
n 
Hence 
2 ja 
€ VT Ke tn tity Fer 0... (17.60) 
id ny 


Now Z (02) = 0, for the variation takes place in the hyperplane. Hence, for given R, 
2 
Z is the point for whieh y C2) is a minimum. As x tends to infinity the n;’s tend to 


- i 
equality, and hence Z is the point on the curve which is nearest to R. Thus R is, in the 
limit, projected orthogonally on to the curve, that is to say, in the limit, on the tangent 
line. . : 
Now we know that these points are distributed normally with variance 1/mn and 
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this proves the theorem? We may also evaluate the variance of the maximum likelihood 
estimator; for 


= — r A - : . (17.61) 
m DS 


and since ¢’ approaches ¢ for fine grouping we have also, remembering that 1/m = f; (00), 


anf. (3) F 
zL. (RL) ras, e Ere x atta oH TO 


where 0 is to be put equal to 0, on the right. 
It may be remarked that condition (c) at the beginning of the section prevents the 


vanishing of e which might render the expression (17.61) nugatory. 


17.26. We have, then, under the afore-mentioned conditions, 


e (Pe 


vart 00 


1f the range is independent of 0, or if f and 3 vanish at any extremity of the range which 


depends on 0, we have the alternative form— « 
IASS 0? log f , 
ma^ ZI 3 ) P LCS MOS Ee CT O EAE 


b 
In fact, since f fda =1 where a, b are the limits of the range and may contain 0, we 
a 


have * 


ERON _ (? ,dlogf ab — aa 
o= El ri fis OBS ae + f (bs 9) 55 — f (00) 5; 
_ (°,/alogf 
= fC) e 
Differentiating again, we have 


se f Cae) sae ERE )re (8) Ua") a m 


Again, if the range is independent of 0 or if (2) vanishes at the extremity, the last two 


* The operation of differentiating under the integral sign requires certain conditions as to uniform 
convergence, even when the limits are independent of 0. To avoid prolixity we shall always assume 
that the conditions hold unless the contrary is stated. The point gives rise to no statistical difficulty 
but is troublesome when one is aiming at complete mathematical rigour. 


A.S,—IL 


o 
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terms on the right in (17.64) are zero, and we have (reverting to our usual convention as’ 


to limits) 
0? (log f) f 2 log f\? 
[we = Cre 
and the result follows from (17.62). 


17.27. We now prove a third fundamental property concerning the efficiency of 
maximum likelihood estimates. 

If t be any estimator of 0, the range of f (v, 6) is independent of 0, and in large samples 
t is distributed normally about mean 0, (the true value of 0) with variance v; then 


1 ^ f9logfWY? 5 =e 
m cannot exceed EA 3 ) faz with 0 = bs; 


and hence, if a maximum likelihood estimator exists, it is most-efficient in the class of 
such estimators. 
By hypothesis, we have in the limit for the frequency function of f, 


1 (t — 0)? 
b= Gs) exp { 3v } . i ; . (17.65) 
and hence 
, a? a? log ð _ 1 
$ Sa e ims E 3 . (17.66). 


where, for convenience, we drop the suffix of 0 until the end of the proof. We then have 
z= a Ube Be a. 
LJ 


=f as) * es es ee TRET 


u = (log 2) . ; : : P - (17.68) 


Now consider 


as & random varai over the possible values z, . . . z, conditioned by t = constant. 
Since the frequency of u is L, we have 

F Z (Lu?) _ {2 (Lu)}* : 
FQ ^g c o c. 0288 
with summation (or integration) over the range of z's. Now 4 is the frequency of all 
samples having a constant f, and hence 


o= X(L): 


var au = 


Hence 


var u = 


-a*6($)) -- (E) at *- (HN 


Now varu cannot be negative and Ø is not negative, and hence 


zila) -alay »* PAE y 


Eu) _ {E(Lu)}? 
$ o: 
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(9L Dor ND 
"m 3 m) 96 CD = p 
and hencé, substituting in (17.71) and integrating over all t, we have 


jez Gy) o fas) => MEDI 


Now J is carried out over all x for constant t and the integration over all f, 80 that the two 
summations together are equivalent to summation over the z's without restriction, Hence 


Do © 1 LAC 
; «4-3 W [^ (55) deis HER, 
«[.- e i r ( Y de Te 


* [8logfV* 
— d 
d er. 
which establishes the result, since the expression on the right is the reciprocal of the variance 
of the maximum likelihood estimator, if it exists. 


17.28. The fourth fundamental theorem of maximum likelihood estimators is as 


follows :— - 
Tf a sufficient estimator exists, it is a function of the maximum likelihood estimator. 


In fact, the likelihood can then be put in the form 
L=Dyt, 0) La (3i o - + Zah 
where L, does not contain 0. Hence 
a a 
aj 108 L = 3; lo Ia 
= y (0, t), a function of 0 and t only. . (7.73) 


Hence, for fixed f, J log L is constant, and it follows from the previots section that the 


variance of t is equal to the variance of a most-efticient estimator (fof var u is then zero 
for fixed t and the inequality (17.72) becomes an equality). Hence the sufficient estimator 


is most-efficient, confirming the result of 17.18. 
It follows from (17.73) that the maximum likelihood estimator is given by 


y (0,0) =0, . 1 : : ^ . (17.74) 


which proves the theorem. 
Conversely, if t is such that (17.73) is true, it must be sufficient ; for then we have 


log L = C + f y (0, t) dð, 


where C does not depend on 0 and the likelihood is of the requisite form. 


Example 17.7 
Consider the estimation of the parameter m in the population 


1 1 fz — mM ,, te 
dF EE exp Iz ; (7) } ae, oar go 
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where o is known. The frequency function is easily seen to obey the conditions relating 
to maximum likelihood estimators. We have 


1 n 
log L = — n log o y (2x) yn PX (a; — m), 
and hence the maximum likelihood estimator is the root of 


G 1 
y m); 
3 log L E Z (x — m) 
giving m= > Z (w) = Z. 


Tt is frequently convenient to denote the estimator of a parameter by writing a cir- 
cumflex accent over it in this way. 

In this case the sample mean is the maximum likelihood estimator. It is therefore 
most-efficient and no other estimator can have a smaller variance in the limit. For the 
variance we have, from (17.63), 


IS a wf atlog f 
varm "f. ( 20? LS 


giving the familiar result — 
vari = zs 
n 
This, as it happens, is true for any ». The estimator is also sufficient, for 
ə lx 
as 8P = 05 — nm) 


— a function of m and z only. 


The condition that o? is known is to be noted. Complications arise when two parameters 
are estimated simultaneously, as we shall see presently. 


Example 17.8 
Consider the estimation of 0 in the Type III distribution 
D—16-—2/0 
F = PC dz, 0 «z «oo 


where p is known. 
We have 


log f = (p — 1)logz 2 — log I (p) — p log 0 
and hence, dropping terms independent of 0, 


log L — -$20 — np log 0. 


E] ZZ 


=a 2 P 
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The equation of maximum likelihood is then 


1 np 
pz) = 


giving ĝ = 
The variance is given, by (17.63), as 
j Died i 22, p 
rne]. (-F tA) 
= p 9p]. 
IU ie » ae 


var Î = — 
n 


where 0 is the true value of the parameter. We could also have obtained this result directly 
(and again it happens to be true for all n). From Example 10.11 (vol. I, p. 244) we have 
for the distribution of z/p = 0, 


à 6 
fn»-! ex a a 
dF = n? oN SAR N Risks 
p T (np) 
from which the first two moments about the origin are 


" Bye E mp +1, 
ui = 0, m qe 0°, 


Eon 6 0? 
giving varÜ = ja = mA 
We note that the likelihood function may be put in the form 

log L = (p — 1) Z log x — n log T (p) -net — np log 0, 


from which it is evident that Ó is sufficient. 


Example 17.9 


Consider the estimation of the parameter / in the Poisson distribution whose general 
y 
term is e~*—. 
z! 
In this case the likelihood function is discontinuous and we have 


a ne 
Hence jj Oe = XUI; 


giving i =, the sample mean. 


Ranioned Nox o s 
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For the variance we have 


1 Uy easy 2) 
x =n = et 
vara 2G z! 


z-0 


var 4 = ^ a familiar result. 
n 
It is easy to see in this case also that 4 is sufficient. 


Example 17.10 


What is the most general form of distribution, differentiable in 0, for which the sample- 
mean is the maximum likelihood estimator ? 
We.are given that a solution of 


ð A Locri wed 
piel -i( à) =0 


is quos (a) 
n 

or x(x — 60) =0. 

This is true for all z and 0, and hence 


where K is independent of x but may be dependent on 0, say equal to dd Then, 
integrating, 


log f = f a0 œ — 6) ZY - 


(6-02 4 y cC) 


where ¢ (x) is an arbitrary function of æ. Hence 


J =k exp le -0 4 p0) +e}, 
which is the most general form of f. 


If y(0)—30?, f(x) = — 4r? 
the form becomes the normal distribution 


f=kexp {— $ (x — 0)?}. 


Successive Approximations to Efficient Estimators 


17.29. In the examples we have just given, the solution of the maximum likelihood 
equation was carried out without difficulty. It frequently happens, however, that the 
equation is by no means so easy to solve explicitly, though it can sometimes be solved 


" 
E 
| 
| 
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for particular values of x by iterative methods. Another possibility is to compute an 
inefficient estimator and correct it by an extra term, which can be obtained as follows :— 
Let t’ be an inefficient estimator and ¢ a most-efficient estimator. Let 


ó — i —t. 
Then var ô = var t + var t — 2 cov (t',t). : . e (17.75) 
Remembering that if E is the efficiency of t’, 
var t = E var t 
Eat 
ee = VE (see (17.39); 
we have 
var ô = 


i ath ; A : . (17.76) 


If then t'is “ nearly ” efficient, that is, if 1 — # is small, the average value of  — !' — t 


will be small. 
If the maximum likelihood equation is 


oL 
ae Th 


consider 
at + vare (282) A CER Tm 
900 Jr 
We have 
a log L 9 log J T (2 log 2) ; 
aiu i Ed pins Es tt -——— terms of higher order 
( 90 jy ( 90 yr ) 90? Oy P 
a? log L s 
= (ť— Earum d A . . . ° . (17.78 
(t »( TE ).. ( ) 


For large n, approximately 


1  f[f8*lgL 
^ vari X 00? 


and hence, approximately, 


00 vart 
Hence JE 
fey og 
=t + vart ( 30 og 
=t+t—-t 


=t, 
and 1" is an efficient estimator to a better order of approximation. This process may be 
repeated and, rather like Newton’s successive approximation to the roots of an equation, 
may be expected to improve the efficiency of an estimator. 


Example 17.11 
Suppose we have to estimate 0, the parameter in the Cauchy population 


1 dx 
= = a — oc &r «o. 
a xl + (x — 6)? 
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We have already seen that the sample-mean is not a satisfactory estimate and that for 
large samples the median is consistent and has variance z*/4m. 
"The equation of maximum likelihood gives 


ð log L 2 (x — 0) ) 
S = = 0; 
00 « fi + (x — 0)? 
This is a (2n—1)-ic in 0 and correspondingly difficult to solve. We may, however, 
find the variance of the solution 6 from (17.63). We have 


0? log f 1 2 (x — 6)? 
E ric. afl te- aay 


Hence 


8 
The median, therefore, has an efficiency of = 0-8, and we expect that 


Ain ste ð log L 
r =w + var d ( 30 ies 


4 r—t “4 
=t —-2, ———_}, 
s n li fem} 


where ¢’ denotes the median, will be an improved estimator. 


Most General Form of Distributions possessing Sufficient Estimators 
17.30. If t is sufficient for 0 we have 
ô log L 
00 


where K is some function of ¢ and 0. Regarding this as an equation in £ we see that it 
remains true for any particular value of 0, say zero. It is then evident that ? must be 


expressible in the form 
=ar eh "etat 07.80) 


where M and k are arbitrary functions. If w = Yk (x) then K is a function of 0 and 
w only, say N (t, w). We have then 


Plog L IN aw 81 
E r P E A ERN 
Now the left-hand side is a function of 9 and æ; only and w is a function of x; only. Hence 


= K (t, 0), . : r A . (17.79) 


A is a function of 0 and æ; only. But it must be symmetrical in the zs and hence is a 
function of 0 only. Hence, integrating with respect to w, we have 
N (t, w) = wp (0) + q (0), 


a rm 
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where p and q are arbitrary functions of 0. Thus 


È log D) = FF flog f(a) }= POZE) +a . + (07.82) 
Vite 2 jog f (a, 0) = p (0) k (x) +44 (0), 
00 n 
giving f(x, 0) = exp (p (0) k (x) -- q(0) +r (2) i d . (17.83) 


where we still write p and q for the integrated functions. 
The expression may also be written 


f (x, 0) = Q (0) R (x) exp (p (0) k (x)} - + (1784) 
or, if we simplify the specification of the distribution by writing 0 instead of p (0), 
f(z) =Q (0) R (x) exp {0 k (x)} . : A . (17.85) 


Tt will be found that if (17.85) holds, the likelihood function is of the required form for 
the existence of a sufficient estimator, so that the equation is sufficient as well as necessary. 


Distribution of Sufficient Estimators 
17.31. It is remarkable that the distribution of a sufficient estimator can be obtained 
directly from the likelihood function. From (17.85) we have 


log L = n log Q + Z log R (x) + 6 E k (x) 
giving, for the maximum likelihood estimator, 
n 0Q 
YEA EXE . . . . d 
o» +I (x) (17.80) 
Now, for the characteristic function ¢ (x) of w (= Xk (x)) we have— 


$ (x) = na f f ei f (a, 0) day . » - f (Em 0) da, 


- {f° ero» fac} 


£z 1p Q (0) R (x) glia 0) klz) a 


3 


CI A i 17.87 
- led um P. EST) 
Hence the frequency function of w, if existent, is 
E L E al CO SU: 
f) —5- ot locus] da. ks 
Now from (17.86), 
w= - G 3) 
Q 0 Jo-t 
=n S (t), say, 


and’ hence the frequency function of the estimator ¢ is 


f= z (5) [3 einen | emt da. . + (17.88) 


26 ESTIMATION: LIKELIHOOD 


Example 17.12 
The normal distribution with unit variance may be put in the form 


1 
E -ir g- MP orf, 
f Vien) ° exu e 
Comparing this with (17.85), we see that if 
Q (6) = e" 
H 
= git 
Bl) = 7am) ° 
klg) Se 


the condition for a sufficient estimator is satisfied. That this is (as we already know) 
the mean i may be confirmed from (17.88). We have 


a ? 
S (6) = = L= 95 


and hence for the frequency function of the estimator z, 
zl. gini i y dx 
p = dta exp {— 3na* — ixn(z — 0) } da 
E No qa 


Example 17.13 


The Type III distribution considered in Example 17.8 may be put in the slightly 
different form 


dF = 


y? 
a-l e-z dy, 0 «z « o. 


T (p) 
Regarding p as known and considering y as the parameter under estimate, we see that 
a sufficient estimator exists, because we may deis 


90-775 
R (x) = yl 
k (x), 


which throws the distribution into the form (17.85). We have found the estimator and 
its distribution in Example 17.8. " 
On the other hand, suppose that y is d and we wish to estimate p. Writing 


Q(p)— aS 
R (x) =g- log x 
k (x) = log x 


we see that a sufficient estimator for p also exists. It is the solution of 


d 1 
—— l = = 
dp log T (p) + og y +7 Z log z 0, 


———— 


a ee eee 
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which does not permit of expression of p as a simple function of the z's. The sampling 
distribution is not expressible in a simple form. 


Example 17.14 
Consider again the Cauchy distribution 


hie EU 


x IQ — o «t « o. 


Evidently this cannot be thrown into the form (17.85) and hence no sufficient estimator 
exists. We have already found (Example 17.11) that there is an efficient estimator. For 
finite n no single estimator can contain all that the sample can tell us about 0. 


Sufficient Estimators when the Range depends on the Parameter 

17.32. One of the conditions of the theorem of 17.23 and that of 17.27 is that the 
range should be independent of 0. In the contrary case our results, particularly for sufficient 
estimators, require reconsideration. 

Suppose the range of the frequency function is from 0 to b, where b isfixed. If there 
is a sufficient estimator for 0, say t, the distribution of t and any other estimator is inde- 
pendent of 0. Take 2, the lowest value of the sample, as such other estimator. Then 
if t is fixed the distribution of a, is independent of 0, which is clearly impossible unless in 
fixing t we also fix a, that is to say, t is a function of z,. Thus if a sufficient estimator 
exists it must be a function of 2. 

Similarly if the range is from a to 0, a sufficient estimator for 0 must be a function 
of the largest sample member. 


17.33. Ifa, or some function of it is sufficient for 0, the lower extremity of the range, 
and x, is fixed, the probability that any particular sample value z is greater than 2, is 
proportional to f(x, 0). This must be independent of 0, since 2, is sufficient, and hence 
so is f (x, 0)/f (xı 0). Thus 

g () 3 
= . . . . .. (17.89 
seo = Toy (17.8) 
and this is the most general form admitting a sufficient estimator. 

It remains true in such circumstances that the smallest member of the sample is 
a maximum likelihood estimator. For the likelihood is 

L-92€9---90) 
(»or ^" 
which is clearly a maximum when + (9) is a minimum. Now since the total frequency is 
unity we have, from (17.89), 


b 
(0) = [06 a. PO sn us. ro (10) 


0 cannot be greater than x, for then such a sample value could not appear. The value 
which minimises h (0) is seen from (17.90) to be that which minimises the range, i.e. 2. 


17.34. When both extremes of the range, a and b, depend on 0, some further modi- 
fication is necessary. Suppose that a is equal to 0 and that b (0) is some strictly decreasing 
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function of 0. Let X, be the value such that b (X,) = £p, the greatest member of the 
sample, and let ¢ be the smaller of x, and X,. Then of the inequalities 
DDR y mE T. x : : . (17.91) 
one at least is true. But the first equality implies that £2» 60 and the second that 
b (t) < b (6), and either of these two implies the other. Hence both inequalities in (17.91) 
are true, and 
Dp TN XD E. 92) 
Samples with fixed ¢ then lie in a fixed range, and hence ¢ is sufficient if the frequency 
function is of the form (17.89). It would seem that this remains the most general form of 


frequency function admitting a sufficient estimator when both extremes of the range 
depend on 6. 


Example 17.15 
Consider the rectangular distribution 


dx 
dF = 35 —0 «zx «0. 
If we take the ordinary likelihood equation we get 
[ d n 
LA E. MENU oy un 
=p e£ L aj" 108 ( 0) 5 


For this to vanish 0 must tend to infinity, an obviously nugatory result. In accordance 
with the above discussion we,should take as our estimate of 0 the smaller of z, and — 2,, 
and this is obviously sufficient, for nothing in the sample can tell us more about the 
terminals of the range than its most extreme members. 


Intrinsic Accuracy 
17.35. If the sampling distribution of an estimator ¢ is 


dF = ® (t, 0) dt : 5 í A . (17.93) 
we define the accuracy of t as 
= f[00M? 1 
l= id E 
I ( a) g” 
* a log o? 
=z (Pee) EE ; . (17.94) 


It is evidently essentially a positive quantity. We assume, unless the contrary is stated, 
that the range is independent of 0. 


T is the quantity we have already encountered in (17.67) as the reciprocal of the 
variance of ¢ when it tends to normality in large samples. As in 17.27, we have 


r «| SEED EAM RENDANT. (17.95) 
-e \ 00 
«nl, say, where 


r=) (Eae. RESI i0 tas 9) 


Now I is independent of the estimator ¢ and we may call it the intrinsic accuracy of 
the distribution f in regard to 0. It is intrinsic because it depends only on f. It may 


———— Á— —— 
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be termed accuracy because it provides, for large samples at least, a minimum to the 
variance of possible estimators of 0. We know from 17.25 that under certain conditions 
the maximum likelihood estimator attains this minimum for large samples. 


17.36. We may now extend the definition of efficiency of an estimator to the case 
of small samples. In fact, the efficiency is the ratio of the accuracy of an estimator to the 
intrinsic accuracy of the distribution for the parameter under estimate. This is easily 
seen to apply to the case of large samples for which efficiency was defined in 17.12, and 
may be applied to finite samples or non-normal sampling variation. For such cases, 
however, it is conceivable that the efficiency might exceed unity. A proof that this is not 
so when the range is independent of 0 is suggested in Exercise 17.12. 


17.37. If the range is independent of 0 we have 
a log f af a 
z( 5 ) ds 5 fsa 0 
and hence the following three expressions for the intrinsic accuracy are equivalent : 
9logf V? 
Ha 


- (EE SUPE Tc eka eT) 


dlogf 
var ( 30 ) 


This equivalence holds if f is zero at the extremes of the range. For we then have 


_ar [rofa e aa 9b 
0-2 “fae = Ps f(a, 0) + £0,0) 


e TE 
= i FTAA 


But if f is not zero at the extremes the equivalence may break down. (Cf. Exercises 17.9 
and 17.11.) 


Amount of Information 
17.38. The quantity nI has been called the amount of information about 0 in the 
sample of n, and Z may be called the amount of information per member of the sample. 
The use of “information ” in this specialised sense has not been universally accepted, 
but some of the properties of Z are such as we should require of any measure of information, 
(a) If the parent does not contain 6, J = 0 so that no sample can tell us anything 


about 0, which must obviously be so. 
(b) Since sufficient estimators contain all the relevant information in the sample 


we expect their accuracy to be nZ, and conversely. That this is so may be seen as 
in 17.27 and 17.28. In fact, if £ is such that the equality in (17.72) holds, var u = 0 


and for fixed f, g zs 
is then of the type required for sufficiency. 


is constant, irrespective of the form of distribution of &. Log L 
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(c) The sum of the amounts of information in (two independent sample-members 
is the amount of information in the pair ‘taken together. For if their joint distribution is 
dF = f, (x, 0) dz fa (y, 0) dy, 
we have for the intrinsic accuracy 


- ff Ly. fate ay 


90: 
2? log f, 
ia ff De eA dz dy -f ORES f f, de dy 
a log fi t de a? log fs f ay, Aron : . (17.98) 
90? 900? 


which is the property stated. 


Loss of Accuracy 

17.39. Where no sufficient estimator exists, it follows from (b) of the previous para- 
graph that no estimator for finite n can contain all the information in the sample. In 
80 far as any particular estimator falls short of the ideal we may be said to lose information 
- by using it. No estimator can avoid losing something, although of course some may 


- lose less than others. 


Presumably the loss will be greater for large samples than for small ones, and will 
be least for maximum likelihood estimators. We may caleulate the loss in this case. If 
t is the maximum likelihood estimator of 0, we have, to a first approximation, 


dlogL _ a? log L 


— t) —- 7.98 
BIR doc Aa i 
The variance of ee in samples for which ¢ is constant is thus the variance of ‘aed P 
within the set multiplied by (t — 9)*. Now the total loss of information, from 17.27, 
Š ð log L " p E. 
is seen to be var u = var i - ] and hence is equal to the variance of t multiplied 
p. : z 
by the total variance oS ae z-— within sets for which t is constant. This we now evaluate. 
Suppose the distribution is grouped so that the “ expected " frequency in the jth 
group is c. The likelihood is then proportional to m," m: . . . and apart from 
constants independent of 0 we have 
lgL-zZEmjlogm; . 3 i c , . (17.100) 
į 
ə log L om 
E necem zen, where m’ = 3» E P . (17.101) 
0*log L m^ $ 
aS -z2(m-m) EDU 807.02) 


We have at once 


1 m^ ors OH 
vart | gz(( s )"] EZin ==} 


=2(5-): Spb ay E ee =. 07193) 


E 
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We shall find it most convenient to regard the n’s as distributed over the groups first of 
all without restriction and then subject to two linear constraints expressed by X (nj) = n 


alog L -z(m 


36 — n | = constant. From this viewpoint the n’s may be regarded as 


distributed in the Poisson form with mean and variance m (not the binomial because we 
are not introducing the restriction that the samples should be of fixed size, except as a 
constraint). 

Now if X (k; n;) is a linear function of the n’s subject to a linear constraint X (x; mj) = p, 
its variance is 
Z? (kam) 
Z2 (ma?)* 
and a second constraint reduces the variance by a term similar to the second in this expres- 
sion. The result may be seen from geometrical considerations. We may write 


s(n) == (tvm wx) and 


Z (km) — AER SEU 07:104) 


S (an) z (avm); 


ae have unit variance and mean 4/m. Consider the different values 
of the ws, say s in number, as the co-ordinates in a Euclidean space. The density function 
of the variables is then symmetrical about a point (a/mi, w/m, . . . M/m,) to which we 
transfer the origin. The variance of the unconstrained variables is then equal to the 
reciprocal of the distance from the origin to the hyperplane X (k4/mz) — 1, namely, to 
X(k?m). But when the constraint is imposed, the variance becomes. proportional to the 
reciprocal of the distance from the origin to the hyperplane in the direction parallel to 
X (u4/ma) = 0 and is hence reduced by the amount 
cos? ¢ X (k? m), 

where ¢ is the angle between the planes. This quantity is 

X? (ka/m.ar/m) 

A e em), 

AX (k? m) X (x* m) um 
which gives us the second term in (17.104). 

Now for the first linear constraint £ (n) = constant = n we have « — 1, and the 

reducing term is (since E (m) = n also) : 


— 1g (km). 
"n 


where the variables 


, 
5 m 3 
For the second constraint we have « — — and hence the term is 


X? (km) 
~  /m2y 
2) 
Thus the variance of X (kn) is acer 
Stm) zip (anyon ia UN TTE IUE YO) 


z( ) 
m 
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Now taking 


and remembering that 


we see from (17.102) that the loss of information is, for large samples, 
^22 , "2 
ex a (m =") } 7i Sa Lie m - 3) 
7h T m Ll p/m? m mj 17.108 
x m? n \m y: m'* M (0108) 
m m 


By considering the width of the groups as tending to zero we may apply this result 
also to continuous distributions. 


Example 17.16 
In the distribution 
" - 1 dx 
Ze a ee 
aT (@ — 0)” 
there is no sufficient estimator, as we have seen. Let us consider the loss of information 
consequent upon using the maximum likelihood estimator. 
We may write for our “ expected " value m 


-0 «z«o 


As dx 7 

al +(x — o0)? 

Hence 5 m? -:[ _4p° dp at 
m] a] üp 2 


ziv E =) } = ai 4 (p? — 1)*dp 7n 
m m ajo (1d p) 8 


zZ m m” — d = 
m m e 

Hence, from (17.106), the loss of information is 

7 1 5 

z 3 + = T 
The intrinsic accuracy of the original distribution is 3, so the loss of information is equivalent 
to 23 observations for large samples. For small samples it will presumably be smaller, 
since it vanishes for samples of one. The loss by use of the maximum likelihood estimator 
is therefore very slight and becomes of diminishing importance as the size of the sample 
increases. 


Ancillary Estimators 


17.40. Where no sufficient estimator exists no single estimator can avoid the loss 
of information ; but we may take an additional function of the variables which together 
with the maximum.likelihood estimator, will give an accuracy tending to unity in large 
samples. By taking a third function we can improve the accuracy still further, and so 


exi 
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on. The process is analogous to approximating to the value of a function (the likelihood 
function) by ascertaining its differential coefficients at some particular point of the range. 


In fact, suppose that, in addition to the estimator which gives Z x £ for some value 
3? log L ð log L 


of 0 such as £, we also find for that value. The variance of x over values 


202 
in the neighbourhood of those for which these two are constant is then, to the first 
approximation, the variance of 
a log L 
poop ORE, 
which has ordinarily a mean value and variance of lower order in n. In particular, if ¢ 


2 
is the maximum likelihood estimator, so that (° zc 2) = 0, the value of ( 2 d E) 
6=t ^ Oat 
may provide supplementary information which enables us to approximate more closely 
to the likelihood function and hence salvage some of the lost information, Such a quantity 


is accordingly called an ancillary estimator. Cf. 17.29 above. 


Multivariate Distributions with One Parameter 

17.41. We now proceed to consider the extension of some of the foregoing results 
in two directions: (a) where there is more than one variate but still only one parameter, 
and (b) where there is more than one parameter to be estimated. 

The former raises no new point of difficulty. To take the bivariate case as an example, 
if the frequency function is f (v, y, 0), the likelihood is 


B= files yu 8) es 3 Mas D oe s E101) 


and our maximum likelihood estimator is obtained by maximising L in the usual way. 


Example 17.17 
To estimate the parameter p in samples of n from 
1 1 4 \ 
= ——Àá — Lou (x? — 2pay + y?) > da dy. 
aF zim] i —55' pay + y*) y 
We find " 
n 1 j 
E => l— p?) — ———_ (Z (2?) — 2p Z (xy) + Z (y?) ), 
log L = constant — 5 log (1 — p Nek d — 9 {2 (x?) — 2p Z (xy) (y?) ) 


whence, for Bess = 0 we have 
P 
1 


1 — p? 


APL Ep EES et) A (amy) +B YE Z (xy) =0; 
eerie ie) (2) 
reducing to the cubic in p, 


1-9 py) — 1 — {5 (2) +2") } = 0. 
oor ah =p) (zy) I= { (x?) (y } 
It is interesting to note that this does not yield the product-moment of the sample. 

y D 
A.S.—1I 
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We have, after a little reduction, 


o log f SIVE peo era 9 Stents Spee a PME 
dp — (1 — p?)? e pay +9") ana +  — psi y 
Since E (x?) = E (y?) = 1 and E (zy) = p, we have, for the estimator p, 
ewer pr (0-903). dpi 
Cavap (1—p) (=p '(—p)? 
. (0 —p3? 
whence varpe- E pi)" 


This is less (and may be considerably less) than the variance of the sample product-moment 
in large samples, (1 — p*)*/n. The efficiency of the latter is 1/(1 + p?). 


Simultaneous Estimation of Several Parameters 

17.42. We now turn to the case when the unknown parameters are more than one 
in number. To simplify the exposition we shall consider the case of two parameters 6, 
and f., but examples not infrequently arise where more than two have to be estimated— — | 
for instance, in the fitting of certain Pearson curves there are four. To fix the ideas, 
consider the normal distribution 


= 1 = 1 =, 2 (= - 4 
F = 5 Foamy OP { gg; * 0,)* ds, eo <% <, 
The likelihood function, except for constants, is given by 
log L = —nlog0, — 1 X (s — 6g. x , . (17.108) 
202 


It is natural to generalise our principle of estimation by looking for estimators which shall 
maximise L for independent simultaneous variations of 0, and 0,, i.e. to require that 


@logL _ 4 9lgL _ 4 


" 39, TE . (17.109) 
In our case this leads to 
Z(r—6)—0 
n 1 
-7 + Ze —%* =0, 
whence for the estimators 6, and 6,, 
: É-irGyjer 5 . . . 0:1) 
(p25 (@ = ay, IL. qu 


Thus the sample mean and variance are estimates of the population mean and variance. 
We note incidentally that the estimator 0, is biassed. 


17.43. There is one possible source of confusion here which should be removed. 
If we know 6; then Ó, is given by 
[pt 
n ; 
which is not the same as (17.111), the sample-mean # having been replaced by the known 


2090) Lu ^. (7.112) 
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quantity 0,.. Suppose then we estimate 0, by z, as we may do whether we know 0, or not, 
since (17.110) does not contain 0,. We may then ask, what is the estimator of 0, which 
maximises the likelihood for all samples giving the ascertained value of 0,, namely, z ? 

This is an entirely different question from the one which gave rise to (17.111) and we 
must not be surprised if it has a different answer. The variations of L from sample to 
sample are now considered in a certain sub-population for which z has a fixed value. 

In our particular case the problem can be solved explicitly. The likelihood function 
can be thrown into the form, with variables z and s— 


me 1 WI t A 
L dà ds lee? | ag Ë 2») 


qi-1) sW-? 1 ns?\ 7. 
x morai) (=) à; exp ( — oa) dzds, . = (17.113) 
where s? is the sample variance. 


If we maximise the likelihood in this form for simultaneous variations of 0, and 0, 
we arrive back at (17.110) and (17.111), as of course we must. But if @ has a fixed value, 
the distribution of s becomes of one lower degree of freedom. The likelihood is then 
proportional to the second factor in (17.113), viz. 


gn-? ns? 
aro (3n) 
and for variations of 0, this is maximised by 


= Ze 3k. gic a pata) 


^ n 
= 
BiT m= 


This, it may be noticed, is an unbiassed estimator. 


17.44. The difference between (17.111) and (17.114) is apt to be confusing, for both 
are, in a sense, maximum likelihood estimators. The distinction arises from the fact that 
we are considering the variation of L in two different populations, the first over all samples 
of size n, the second over the more restricted samples subject to the further constraint 
X (x) = constant. The difference when n is large, of course, is quite unimportant, but 
as a theoretical matter the point has some interest. 

Which of the two is employed for practical estimation is a matter of choice. At first, 
sight it may strike the reader as objectionable to use (17.114), because & is not known before 
the sample is drawn, and there are obvious dangers in basing an inference on properties 
of the sample which are determined a posteriori. This objection, however, does not lie 
in the present case. We make up our mind beforehand that, whatever may turn out 
to be, we will make an inference in relation to the sub-population of samples determined 
by it. There is, in fact, no posterior determination of the rule of inference. 


17.45. Possibly without realising it, the reader is already accustomed to make an 
inference of this kind in relation to a sample number. We do not usually determine before- 
hand what size the sample must be; our results (apart from the distinction between small 
and large samples, which is another matter) are true for any , whatever » may turn out 
to be in practice. In the same way the estimator (17.114) is a maximum likelihood esti- 
mator, whatever @ may turn out to be, Z being a property of the sample, just as n is. 

The fact remains, of course, that (17.111) and (17.114) give different results. Which 
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is the better? The answer depends on what we require of the estimator.. If we wish 
. to choose 6, and 0, so as to maximise their joint likelihood we choose (17.111). If we wish 
to select them so that the likelihood is maximised for 0, and then, for the observed z, is 
maximised for 0,, we choose (17.114). 


$ 
17.46. It may be shown that, as for the case of one parameter, the likelihood esti- 
mators of several parameters are consistent under very general conditions and tend for 
large n to be distributed in the multivariate normal form. We omit the proof of these results, 
which the reader will probably be willing to accept, and proceed to a generalisation of 
the theorem of 17.26. Thus :— 
(a) If the frequency function f (x, 0,, Os, . . . 0,) is continuous in x, and 
= 
(b) if in a certain interval containing the true values 4,5, Ozo, . . . Opo a is 
j 
G ; : : . 
continuous in 9; for every x, x? E approaches a continuous function of 0; for large 
4 


n, and 2 does not vanish in some interval, then 
j 


n cov (fj, b) =“, 5 k É . (17.115) 9 


where 4 is the (Hessian) determinant 


a-|f (C2) (H spac]. crate 


and Ay, is the minor of the jth row and kth column. When p = 1 this reduces to the 
case of a single parameter. 


As n tends to infinity the joint distribution of the maximum likelihood estimators 
tends to the form 


f= kexp t 5 Zt» (0; — 6&)(0, — da). . ^ - (17.117) 
-The theorem will be established if we show that 


={ (8gf a log f > 
i Tí 30, ),, ( 30, ), a S . 0715] 


for then the values of the variances and covariances of the Î’s are as stated in (17.116). 
(Compare 15.12.) — 
Make the transformation 


di i lu (6; —O) . : : : . (17.119) 
and choose the A's so that the exponential of (17.117) becomes 
lS) 
3 zd. 
'Then - gui = Z Ay Any. 2 . ls 3 = (17.120) 


The q’s are independent normal variates with variance 1/ Hence, from the theorem for 
the case of a single parameter, already proved, we have 


2 [91 2 
Tx met) fae 1. MOSES S. (17,121) 


SIMULTANEOUS ESTIMATION OF SEVERAL PARAMETERS 37 


Further, we have 
f i TREL) pas = o, PREISEN REOS (T s. 17:329) 


EM 0q, 
for if we put qh = s (uy, — u) 
1 
and u = va (te +u) 


the expression becomes one half of 


jore Cr ene 


which vanishes since the ws have the same variance as the q’s. 


Now 
e) ô log f 2) ð log f 
cp = — Z Åy 
H ( 90, tw 9g, e [7 a "7 Ody 
Hence 
^ x) (t) f ( Ter P) 
— en dy = 2A,;44,——— —— dx 
bat 00, / 90, s EST) aor) 0q, f 
= Sd" Anr 
in virtue of (17.121) and (17.122), 
= ik 


from (17.120). The theorem follows. 


Example 17.18 
Let us estimate the five parameters of the bivariate normal form 


1 1 w—a\?  2p(u — a) (y =f) 
ee "est o, ) [A 
+ (tz) anms — o <T, y « o. 


It will be found that the partial differential coefficients of log L yield, on solution, the 
estimators 


dr 


&-& fP=g 
= lIe- i) 
n 
z 1 5 - 
Pô, ôa =z Z (x —2)(9 — 9) 
&-lrg-9 


so that for simultaneous estimation the sample means, variances and covariances are 


estimates of the corresponding parameters. 
To evaluate the sampling variances and covariances we have to evaluate integrals 


~ [80logf ce) 
$ Je 90, 0, ar 


being merely functions of moments of different orders. 


of the type 


These are easily obtainable, 
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Taking the parameters «, 8, c, Ga p in that order, we find for the Hessian (17.116) 
HD tl e See Se Map| Tee 0 0 
ei (1 — p?) 1 02 (1 — p?) 
CS n 0 0 
"eo. (1— p?) 93 (1 — p?) 
Dp p? 
0 TL 
? oj (L= p?) 9,0; (1 — p?) E — p!) 
p? Brp 
0 0 ECL = P MTETI T 3 
9,0,(1—p*) 03 (1 — p*) acp 
UE p* 
0 0 p P 
a, (1 — p?) &(1—p?) U=) 
This confirms, what we now already, that the distribution of means is Eas nt of. 
variances and covariances. We may consider the 2 x 2 block in the top left-hand corner 


and the 3 x 3 block in the bottom right-hand corner separately. If the determinants 
of these blocks are 4, and 4,, we have 


E 1 


e; o3 (1 — p?) 
4 
A] compe eai oe ME 
CLET 
The minors will be found to be given by 
à 4 4p 
edeü-py (Ip) oho 
4p 4 
cioa (1 — p?)* (1— p*)* a} oi 
0 0 : 2 nt rumen 2p 
oz (1 —p*)* eioi(l —p?)* ojos (1 — p*)® 
0 0 A DE ae SS 2p 
oy 03 (1 —p*)* ajo3(1 —p*)* of o$ (1 — p?) 
0 0 2p 2p Js 4 
ial- p) eie — p®)? ofog p 


Hence we find 


These results are already familiar. We have further— 


P? T103 


cov (61, 6) = Pom 


cov (&, B) = 20122 
n 
^4 peo (-— p!) ACA Po: (1 — p?) 
cov (A, 61) am > COV (Â, G2) = <i 
Hence the ate between 6, and 6, is p?, that between & and B is p, and that between 


9 and ô, or 6, is ——. 


VE 
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Example 17.19 
Consider the Type III distribution 


ar ic aiuto ed EE arse 
Ti (== ) exp { (252) bs a «m « eo. 
For the likelihood we have 
log L = — n log o — n log T (p) + (p — 1) Xlog ( = =) «25239. 
o 
The three partial differential coefficients give 


1 n 
1) Z -=0 
(p — 1) OED ti 
n 1 
uh Tas «0 
Ca ea ena AS El (575) =o. 
dp c 
For the Hessian, taking the parameters in the order «, ø, p, we have 
ACT 2 RLEY 
o? (p — 2) g? 8 (p — 1) 
1 P 1 
g? o* o 
: H dalog F'(p) 
a (p — 1) o dp? 
xt 1 o d log T (p) _ D o, 1 )-4 
(p — 2) e* dp* p—1'(p—1* : 
From this the sampling variances are found to be 
, 51 f dilogT (p) 
var & arf’ dp? 1 
e VES d*jog (0) 1 ) 
UAR xal; —2 dp? (p — 1} 
Mf i gH PUE LL. 
varp = (p — 2)ot 


Sufficient Estimators for Several Parameters 
17.47. As a natural generalisation from the case of one parameter we shall say that 


tı. . . t are jointly sufficient for 0, . . . 9, if, and only if, the likelihood function can 
be expressed as P 
DMC Rs ar ple. ek 0p) = Li (tr « «+» lp Os. o « Oy) De (£i e sx. 9.) (17.123) 


Tt evidently does not follow that if 0s . . . 0, are known ż, is sufficient for 0,. This will 
be true only if the function L, may itself be factorised, e.g.— 
Ty (toe a ty One» Oy) = Dos 01 e 05) Lys (ta «tp Oa - « » Gy). . (17.124) 
If a case occurred in which 

Dy = Ls (ty, 91) Lis (ls Os) < e. Lip (tyr Op) - . . (17,125) 
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we might say that each £ was sufficient for the corresponding 0 or that the set of /'s was 
completely sufficient for the 0's. Such cases, however, are very rare. 


Example 17.20 

From (17.113) it is evident that z and s are jointly sufficient for m and c. If o is known 
# is sufficient for m, but if m is known s is not sufficient for c. The two are not completely * 
sufficient. ~ z 


17.48. The properties of sufficient estimators may be proved true, with certain 
modifications, for several parameters, but we shall not take the subject further except 
to quote one result. 

If f (x, 0, . . . 0p) is continuous and not zero over some continuous range of the 6’s, 
and z exists, then it is necessary and sufficient for the existence of a set of jointly sufficient. 
estimators that 


p 
fel an enar], SM. 07128 
k=1 
where A, and B are arbitrary functions of the 6’s and X, and Y ofz. (See Koopman, 1936.) 
Example. 17.21 
The Type III distribution of Example 17.19 gives us 


log f = — p log a — log I (p) + (p — 1) log (x — a) — — 
If « is regarded as known, this may be put in the form 


— EL (p — 1) log (x — «) — p log o — log T (p), 
which is of type (17.126) with 
ASE A EK Sate 
o 
4, =p — 1, X; = log (x — a) 
B = — plog o — log T (p). 


Thus if « is known, there are sufficient estimators for c and p jointly. It will be clear on 
inspection that if « is unknown there are no sufficient estimators; even if c and p are known. 
Parameters of Location and Scale 

17.49. Consider a frequency function expressed in the form 


ar =g (555a (55) EE 07:197) 


The parameter a may be regarded as locating the distribution and f as determining its 
scale, In particular the normal distribution may be put in this form. We may write 
dF = exp ¢ (£) d£ = exp ¢ (£) ag 4 x . (17.128) 


B 
where dos and  $ (5) = log g (2). 
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In sesífples of n we have 
logL-2Z4$-—nlogf, 
giving for the maximum likelihood estimators 


dlor e EE TR 
x ge jd EA Me ese . (17.129) 
9lgL 1 e] 
ap Reo OW mes C (L 75180) 


whence we may solve for & and f. 
For the variances and covariance we find 


s (Put) -5(&)- (Fae) 
(Eai a iensen] 
duce] - ng? 
z( ED) - nera] 
no fuera (ee) 


T a -a (fs) a ALT- T31) 


$E ape 
c). cse 

from which the variances and covariance of & and f| may be determined in the usual way. 
2 vanished, for 


In (17.131) it would be a great convenience if the quantity — E B ) 


then & and f would be independent. By a suitable choice of origin we can, in fact, ensure 


that this is so. Put 
Lg EQ) 
t-t NO T A NN cat (17.132) 
Then serge {oro+e SG} 
—E( + £4"), 


E (¢" £) = 0. 
With this origin we have for the variances of the (uncorrelated) variables & and f, 


varĝ = — c ; : TEE. . (17.133) 
i nE (9^) 


so that 


RT CFI ie ue 17.134 
var p = ning) — 15 . (17.134) 
The point of location so defined, namely, as that for which & and f are uncorrelated, has 


been called by Fisher the centre of location. 
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Example 17.22 
For the normal distribution 
TEIR exp [-i (y }e 
BV (2z) B 
we have $2—]425 
E(j)——1 and  E(j5)-0 
Hence ¢ = é, and the origin chosen is itself the centre of location. From (17.133) and 
(17.134) we find the familiar results (for large samples) 
p* 


var & = var = — 
n 


= PD 
var f = vars —Ó 
with Z and s uncorrelated. 
Example 17.23 
Consider again the Type III distribution 


renin (Se) m Cty) sere. > 


where we assume p known. The condition p> 1 is required to ensure the vanishing of 
the frequency function at the extremity z = «, and p > 2 to ensure the convergence of 
some of the mean values. 


Here 
$ = constant — & + (p — 1) log £. 
Hence 
DRE Ep ae 
iub x. 
aiti (ee ecd din 
EGé) e E(-*5)--1 
E (£* $") = E(—p--1) = — (p — 1). 
Thus ~ 


e= eea] 


The centre of location is distant (p — 2) to the right of the start of the distribution. In 
terms of ¢ we have 


$ = constant — ¢ — (p — 2) + (p — 1) log (¢ + p — 2) 


pee ET A 
$ E P : € +p —2)8 
#($") = — 1/(p —2) 
E(¢"t?—1) = —2. 


Hence 


vara = fp — 2) 
n 


var f = 5 


RUE 
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Efficiency of the Method of Moments 

17.50. In previous chapters we have fitted distributions of the Pearson type to 
other distributions by identifying lower moments. We were there mainly concerned with 
the properties of populations only and no question of the reliability of estimates arose. 
If, however, we regard the data as a sample from a population, the question arises whether 
fitting by moments provides the most efficient estimators of the unknown parameters. 
As we shall see presently, in general it does not. 

Consider a parent form dependent on four parameters. If the maximum likelihood 
estimators of these parameters are to be obtained in terms of linear functions of the moments 
(as in the fitting of Pearson curves), we must have 


ð log L 
00 


and consequently 
f (2, Ou On Os; 04) = exp {bo + b, m + baz? + bya? + 6,24}, — . (17.130) 


where the b’s depend on the 0’s. This is the most general form for which the method of 
moments gives maximum likelihood estimators. The b’s are, of course, conditioned by 
the fact that the total frequency shall be unity and the distribution function converge. 

Without loss of generality we may take b, — 0. Tf, then, the other b’s vanish except 
b, and b, the distribution is normal and the method of moments is most-efficient. In 
other cases, (17.136) does not yield a Pearson distribution except as an approximation. 


For example, 


= ay + aZ (x) + a4 Z (x?) + as E (2?) + a4 Z (4) . (17.138) 


dlogf = 2b, x + 3b, x? + 4b, x? 
da: 


If b, and b, are small this is approximately 


dlogf _ 2b, a TOT etl 7.187) 
dx ee 3b, y 2b, P 
2b, 2 


which is one form of the equation defining Pearson distributions (cf. 6.2). Only when 
b, and b, are small compared with b, can we expect the method of moments to give estimates 


of high efficiency. 


17.51. A detailed discussion of the efficiency of moments in determining the para- 
meters of a Pearson distribution has been given by Fisher (1921a). We will here quote 


only one of the results by way of illustration. 
We found in Example 17.19 that the variance for large samples of the maximum 


likelihood estimator 9 is given by 
2 


BPE AS dhgT(). 2 , 1 } 
$ a pi 1) 


or, if p =p —1, by 
2 
= . . " 17.138 
var p n ger p P.I] (17.138) 
{ dp* DOC: 


v 
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Now for large p* n 4 Y 

1 1 
aera +p) = ali Mg 2 + (p + dlogp-?p»-.- — sep * 1350p i j 
We then find , 


ajena +p)— 


and hence, approximately, 


atm L L) 
» tpsim ipw 


var = sp? Tip. . z s A . (17.139) 
If we estimate the parameters by equating sample-moments to the appropriate moments 


in terms of parameters, we find 
f " æ -+ op =m, 


op =m, 
t 1 2p? = m, 
so that, whatever œ and c may be, 
> 2 
nu WE. . 07 
3 
fa p 


where b, is t, wemple value of fı. Now for estimation by the method of moments (cf. 
9.22), 


varbi = (49, — 248, + 36 + 98. f — 120, + 390), 


which for the present distribution is readily seen to reduce to 


var b, zi .9(p + 1)(p +5) : ed H . (17.141) 


? P 
Hence, from, (17.140) we have for p, estimated by the method of moments, 
5 4 


p! 
varp = jg Y b, 


a , 6 
* + mah + 1) (p + 5). 
For be p the i ud of this estimator is then, from (17.139) with p = 1 + p, 


z P? +EP 
KEDEDE 
which is evidently short of unity in many cases. When p exceeds 38-1 (f, = 0-102) the 
efficiency is over 80 per cent. For p = 19 (f, = 0-20) it is 65 percent. For — 4a more 


exact caleulation based on the tables of the trigamma function Gog P ( +P) i EP) shows 
p? 


that the efficiency is only 22 per cent. 


* The series for the log I" function is given in most books on advanced calculus, e.g. J. Edwards, 
Integral Calculus, vol. 2: article 942. 


. 
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NOTES AND REFERENCES 


The greater part of this chapter is based on the researches of R. A. Fisher, the main 
papers being those of 192la, 1925b and 1934a. The idea of maximising likelihood may 
be traced back to Gauss and was considered by Edgeworth, but may be regarded as begin- 
ning to exercise an influence on statistical theory only with the publication of Fisher's 
firs& paper in 1912. 

The theorem giving the limiting variances and covariances of maximum likelihood 
estimates was proved (incorrectly) by Karl Pearson and Filon in 1898 before it was realised 
that it applied only to maximum likelihood. The necessary correction was given by Edge- 
worth (1908) and Fisher (19214), but rigorous proofs were not available until the work of 
Hotelling (1930) and Doob (1934a and 6, 1935, 1936). In the text we have followed 
Hotelling's treatment. E 

The inefficiency of moments in fitting distributions, pointed out by Fisher (1921a), 
has led to some controversy, for whieh see Koshal (1933, 1935), Myers (1934), Elderton 
and Hansmann (1934), K. Pearson (1936), and Fisher (1937a). -The reader who pursues 
this subject so far as to read any one of these papers should read them all. 

For work on sufficient estimators see Koopman (1936) and Pitman (1936, 19375), who 
independently obtained the general form of distribution admitting such estimators. The 
theorem that sufficient estimators have the property 17.17 is due to Fisher, rigorous proofs 
being provided by Neyman (1935a) and Dugué (19364). Reference should also be made 
to papers by Bartlett (19362, b, 1937c, 19385, 19392, 1940) on the problem of several para- 
meters and what he calls “ conditional " statistics, i.e. those similar to s? when @ or some 
other function of the sample values is regarded as known. See also Neyman and Pearson 
(1936a). 

yn. recent papers, that by Pitman (19392) on parameters of scale and location, 
and that by Welch (1939c) on the distribution of maximum likelihood estimates, are 
noteworthy. } 3 

Geary (1942) has recently proved a remarkable generalisation of the theorem that 
in large samples maximum-likelihood estimators have minimum variance in the case of 
one parameter. In fact, for several parameters the maximum likelihood estimators 
minimise the “ generalised variance " as defined in Chapter 28. "s 


EXERCISES * 
17.1. Ift is a most-efficient estimator and /' a less-efficient estimator with efficiency 
E, and if the correlation of t and t'is p, show by considering the estimator t” defined by 
y (+E —3p VE)t" = (1 — p VE) t + (E — p /E)t : 
that p = vE (for in the contrary case vari" > var t). 
(Fisher, 19255.) 


17.2. Ifin n trials of an event with probability p there are x successes, show that 
a maximum likelihood estimator of p is z/n. Find its sampling variance and show that 
it is sufficient. t d 
17.3. Show that the distribution 
dF = } exp {— |x — 0 | ) dz, Io «mz «o 
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hasa » likelihood, function for a sample of 2 which is a maximum at the median if n is odd 
and between the (n/2)th and (n/2 + 1)th members if n is even. 


y 
17.4. For the distribution of the previous exercise show that for a sample of (2m + 1) E 


members the median has an accuracy ; 
- (m 4- 1) (2m 4- 1) r= (2m) ! i 
(m — 1) 32n-1 (ml)? f’ 


Hence, as m tends to infinity, the loss of information tends to 4 (m/z) — 4. Thus, 
although the median is most-efficient the loss of information in large samples does not 
tend to a poe. 


(Fisher, 19255.) 
LAT | 


17.5. Show that if a most-efficient éstimator A and a less-efficient estimator B tend | 

to joint normality for large samples, B — A tends to zero correlation with A. | 
Show that the: error in B may be regarded as composed (for large samples) of two 
parts which are independent, the error in A and the error in B — A. (The first may be 
regarded as sampling error, necessarily inherent in the problem of estimation, the second 
as error due to the inefficiency of the estimator.) 
- ‘ , (Fisher, 19255.) 


17.6. .Show that the distribution of the median in a sample of (2m + 1) observations 
from the population u 
I div, Eh ds | 
A en ERIT eE ED E Ies 
-is given by 
45 AR ap = Qn 0! ZEN. da: 
et (m !)? nemti | 4 1+ (x — 6)” 
where tan ġ =x —0 and |¢| «iz. 


Show hence that the accuracy of the median is 


š ee [ [me cos? ¢ + (T = e) sin Ju (€ e e) ag 
ERE Oe Cats oan 


(m +4)! (1\™+* 2 2 
t cuml (3) {oy Jm- (22) — TI 4.4402] 


where J, (2) is the Bessel function of order n and in particular J, (x) = J, (2x) = 0, 
m js LI i 
Ji (n) — J; pg) Ear? and 


2 
Jui M ZI, = Jua 
1 (Fisher, 19250.) 
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17.7. Show that the most general continuous distribution for which the maximum 
likelihood estimator of a parameter 0 is the geometric mean of the sample is 


ou 
f(z, 6) = G \ exp (y (6) -E't (2) }, 


where y is an arbitrary function of 0, and ¢ of x. Show further that the corresponding 
distribution giving the harmonie mean is 


fe e exp [5 [0g v] -3 c0] 
(Keynes, J.R.S.S. (1911), 74, 323.) 


17.8. Show that, if m is known, the estimator 


$= fize Sme 


is sufficient for ø in samples of n from 


SUUS. H (v — m)? 
T ev) P 177 Bot en 
and find its distribution by the method of 17.31. 


17.9. By considering the distribution 
aF = 0-9 dz, 0 «x «o " 


show that the three forms of (17.97) are not necessarily equivalent when the range: contains 


the Pe to be estimated. 
(Pitman, 1936.) 


17.10. Show that if the frequency function is continuous and is zero at an extreme 
which is a function of 0, there still exists a maximum to the intrinsic accuracy, defined 


ô log f V? 
as z( 20 j 


(Pitman, 1936.) 


17.11. By considering the distribution 
2x 
E 0+1 
dF WF? 0 <x <0 + Bd 
show that the intrinsic accuracy is 4n?/(20 + 1)*. Show further that the largest member 
of the sample is sufficient for 0 and that its distribution is ` 


Ina (a? d 2)n-1 


dF = a (x) dë = 0-1)" 
Hence show that 
E Aloga\* _ 4n?(0 + 1)? 4n0? 
( 00 (20 + 1)? (n — 2) (20 + 1)” 


so that the mean value in this case is greater than the intrinsic accuracy. 
_ (Pitman, 1936.) 
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2 
17.12. Ifthe frequency ‘Function ofa an estimator is Ø its accuracy is E la $ m 30 3) } 
If every possible sample with frequency ¢ gave a different value of ¢ the accuracy would — 
be E E ($ y} and would be independent of t. Show that the difference in accuracy 


a0 
10$ _100\? 
[6 - 3) | 
‘and hence is not negative. 


may be expressed as 
A Hence show that the efficiency as defined in 17.36 cannot exceed unity, at least if the 
- range is independent of 6. 


ps (Fisher, 19255.) 

17.13. Show that 

" 6, dæ 

F=- aT A = < 
. ‘ OF RET (o A ey eit 
does not "admit of a sufficient estimator for either parameter if the other is known, ot 
‘a pair of jointly sufficient estimators if both are unknown. f 
i n (Koopman, 1936.) | 


17.14. Show that if a distribution admits a sufficient estimator for either of two - 
‘parameters when the other is known, it admits of a pair of jointly sufficient estimators 
when both parameters are unknown. 
(Koopman, 1936.) 


17.15. Show that the centre of location of the Type IV distribution 


" 


TP 
dF eene + (272) "| \ — c &z « o 
"where v and p are re assumed ‘known, is distant Pee di q to the left of the mode of the distribution. 
, : x (Fisher, 1921a.) ` 
17.16. For the distribution 
dx 0 [] 
Bie = oe =f 
0, pg Ree duy 


show that, in large samples, the mean tends to the form 


1 6n. 6nz? 
gF = = z 
zal = exp (-5 ae. 


Show further that the distribution of the centre of ue sample, say c (the mean of the two 
extreme values), tends to 


a exe 

à. exp { 9; |c | >de. 

» varc 
vari 


Hence = 8 

n 
so that the centre is a far better estimator of location than the mean for this distribution. 
(Fisher, 19214.) 
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17.17. Show that for the Type I distribution 


"e 
= e-l (1 — zy! dz, 0 <a <1 
B (p, q) ( ) 


the geometric mean of the sample values z and that of the values (1 — x) are jointly 
sufficient for the estimation of p and q. 


17.18. Show that all the Pearson distributions have sufficient estimators for some 
of the parameters if the others are assumed ET and ascertain which are the parameters 
concerned for each type. 


17.19. For the distribution of Exercise 17.15 show that the intrinsic accuracy for « is 
1 (p -- 1) (p -- 2) (p +4) 
p* [oS JEFES 
and that the efficiency of the method of moments in locating the curve is 
p*(p — 1) ((p + 4)? +97} 
(p + 1) (p + 2) (p + 4) (p? + v*y 


E 


(Fisher, 1921a.) 
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CHAPTER 18 
ESTIMATION: MISCELLANEOUS METHODS 


Minimum Variance 

18.1. We have seen in the previous chapter that under certain general conditions 
the maximum likelihood estimator is most-efficient for large samples, and that for finite 
samples it leads to sufficient estimators where such exist. Sufficient estimators themselves 
contain all the information in the sample about the parameter under estimate. What 
we have not shown, however, is that maximum likelihood estimators have minimum variance 
in “finite samples. 

We now consider the subject from a slightly different standpoint. Instead of begin- 
ning with the criteria of efficiency and sufficiency and showing that they lead to certain 
minimal properties, we shall examine the class of estimators which (a) are unbiassed and 
(b) have minimum variance. The minimal property is here taken as the starting-point. 


18.2. Consider, then, a frequency function f(x, 0), and as usual let us write 


L=f (a, 0)... f (@p, 0). Then, writing | dz for the n-fold integral over the range 


of the z's, we have to find t = t (xı, . . . £) such that 


[thane . Sr) ee Se 1) 


f (t — 0)? L dx = minimum. . : 5 aa (18:2) 
The first equation may also be written 
f C OM ee ne CL SB) 


The problem of finding t is one of the familiar problems in the Calculus of Variations. ‘The 
minimal value of (18.2) has to be found subject to the condition (18.1), which is equivalent to 


aL 
[Gens DOE hy eae TR) 


provided that the range of f is independent of 0 or that f vanishes at any extreme which 
depends on 0. 

If 24 is an unspecified parameter (which may depend on 0 but not on the z's) the 
problem is equivalent to finding an unconditioned minimum of 


f fu oeie aS} dI ee res (18.5) 
'The solution is * : 


a 5 aL 
at» r-u) E 


* See, for example, J. Edwards, Integral Calculus, vol. 2, article 1504, or A. R. Forsyth, Calculus 


of Variations, article 15. Since the expression to be minimised does not contain a the Euler equation 


oV 
for a stationary value to the integral f V dx reduces to TU 0. The derivation of (18.7) is not, 
50 
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or (—0)L - 39e =o. Line pap ee E0) 
We then have 
2 OL 
PUE 
a ə log L 
miim et: E) 
where ¢ is a function of the x’s but not of 0. Thus there exists a t satisfying our conditions 
t alog L. E. 
if we can express —29 m the form 
@logL t—0 2 
oo aT car a LSS) 


This is a necessary and sufficient condition, except that it gives only stationary values of 
(18.2) which might, for instance, be maxima instead of minima. This is not a point, 
however, which need detain us from the statistical viewpoint, troublesome as it is to the 
mathematician. 


Example. 18.1 
To estimate 0 in the normal population 


m 1 E7 1 Len E 
dP = e { 35i * oh ae, o «az «o 
where o is assumed known. 
We have 
DOEL nucum 
ne soa Eun). 
This can be put in the form (18.8) by taking 
2 
m and = Z, 


and hence @ is the required estimator. We note that it has minimum variance for any 
n in the class of unbiassed estimators of 0. 


Example 18.2 
To estimate 0 in 


"n 1 da: 


Um 0-8) 


ð log L z—60 
90 -:z +e "t 


This cannot be put in the form (18.8) and the method fails, There is no estimator which 
is unbiassed and has minimum variance. 


We have 


however, without its difficulties, and I think some conditions have been accidentally suppressed in 
the Aitken-Silverstone method. I understand that Dr. Leon Solomon, working with Dr. Aitken, has 
obtained a proof which depends on the fact that L shall be the product of n independent frequency 
functions. But for the war the point would doubtless have been cleared up by now, but at present 


it remains open. 
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18.3. Integrating (18.8) with respect to 0 we have 
log L = « (0) (t — 0) + 8 (0) + Ey (x), NA 
j 


where «, f, y are arbitrary functions (apart from the fact that the two former depend on 
A) Hence 
log f (x, 0) = A (6) (t — 0) + B (0) + € (x) 

= p (8) t (x) --q(0) +r (x), say. . 5 . (18.9) 
Comparing this with (17.83), we see that the method of minimum variance will give a 
solution only if there exists a sufficient estimator. This explains the success of the method 
in Example 18.1 (where is sufficient) and its failure in Example 18.2 (where no sufficient 
estimator exists). 


18.4. In the method of maximum likelihood it makes no difference to the final 
result whether we estimate for a parameter @ or for some other parameter y functionally 
related to 0. For 

alog L  0logL dy 
00 | O4 90 j 

and the two sides of the equation vanish together. In the method of minimum variance, X 
however, there is an interesting difference. 

Suppose we wish to ae 0 in 


ex _ ist dx. =O Be 
= Des a0) P 29 > z « o. 
We have 
dlog LD n ,VZ(z?)) 
900 90 ^ 2 Q3 ^ 
and this may be put in the form (18.8) with 
E 9 xd 2-9 
n n 
If, however, we consider the parallel problem of estimating c in 
1 meu 
(ya cmo mmm ES 
ep P ( 3 sit c <r <o 
we find Y 
ð log L- xs 42 (x?) 
Oc “ee oF’ 


which cannot be put in the form (18.8). We thus reach the peculiar result that the method 
will provide an estimator for a? but not for c. It follows that in general we may have 
to estimate, not 0 itself, but some function of 0, say z (0). 


18.5. If a minimum-variance estimator exists for some z (0) we must have 
9lgL t—« 
QUSE ine 


which is equivalent to 


or 
alogL a —*) 
E cmo ee len (18:30) 


i| 
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We estimate £ by putting it equal to t and thus we shall have, for the estimator, 


9log L = 
( x ).o? AND ac co c qe (1811) 


This is equivalent to the equation of maximum likelihood. The two are not, however, 
identical. Maximum likelihood is not concerned with the existence of the function 4. 
Minimum variance takes the function as fundamental, and when it exists the solution 
(which is the same as the maximum likelihood solution) has minimum variance for all n 
in the class of unbiassed estimators, not merely for large n. 


18.6. Let us suppose that 0 is the parameter (transformed if necessary) for which 
the estimating function is 0 itself. Then we have for the minimum-variance estimator t 


var t= f (t — 0)? L dz, 


which, on substitution from (18.8), yields 


vart= f a (t) L da Sea, FRN?) 
^ 72 log L 

TES AU EIER 

A fe a6? ) Bae, PR 


if the range is independent of 0 or f vanishes at any extreme dependent on 0. 
Now from (18.8) we find 


@logL _ QNT el 
302 --93(3) i 


and hence, substituting in (18.13) and remembering that Í (t — 0) L dx = 0, we find 


vwrt= -af (—5) Lae 
EA MM ren rre RES (1814) 


The variance of the minimum-variance estimator is thus simply the parameter 4. It also 


follows from (18.13) that 
LESS [uu Gas 2) Läs 
vart 90? 


aan (eel), s. e tO 


so that the result we reached in Chapter 17, as a limiting form for large n, is now seen to 
be exact for finite » under present conditions. 


Example 18.3 
To estimate 0 in the Type III form 


1 
m pig da, 0 <z <o, p>l, 
P one 1 


where p is assumed known. 
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We have 
8lgL mp + në 
00r 0 9? 
which is of the form (18.8) if 
p= and A= Gi 
p np 


the distribution is not normal. (Compare Example 17.8.) 


18.7. We may readily determine what function v (0) should be taken as the estimating 
function. Taking the general form from (18.9), 


log f (x, 0) = p (0) t (x) + q (0) + " (2), 


we have 
log L = p Xt (x) + nq + Er (x) 
dlogL 9p ôq 
a eae agr 
EU Op (d aq 
=r (G20 *al) re ame 118.16) 
Hence, if 
D ee S OT op 
pos fa ns TENERE 8317) 
we have 
n H 
- X(t) —t : 
dlogt 47077 Deer ODONIS (6b ii) 
Or op . 
l/n a 
which is of the required form provided that 
T op 
37?5 oo eos o (849) 
Example 18.4 
Consider again the estimation of c in 
M Tgi 
uA efe E 
d = Ven) exp ( 35) dz, c «X « oo. 
Here 
jog f = — 3 log (2x) — logo — 17, 
Sr o EUN EM = 
whence p(o) = Ex tx) wt .¢= — logo. 
Thus the appropriate value of z, from (18.17), is 
SECHUT 
Oc] da 


| 


MINIMUM 7? 55 


which is thus determined as our estimating function. For the variance of the estimator 


of t we have 
2g* 
, 


op 
L—1/n--- 
a fh a- 


the estimator itself being Iz (23). 


Minimum y? 
18.8. We now turn to consider another principle which has been suggested for pro- 


viding estimators. If the data are grouped into cells with expected frequency typified 
by 4; and observed frequency by l}, then the function 


(lj — 4) 
HOP ASB a eres SEIT T S TEE 
X 7 ( ) 


where n = Z (4) = Z (L) A 5 A 5 » (18.21) 
can, as we saw in Chapter 12, be used as a measure of closeness of fit. The method of 
minimum g? adopts this standpoint (which is, of course, arbitrary in the logical sense) 
and attempts to determine the parameters 2 such that y? is a minimum, 

In practice the method is not very easy to apply because of the difficulty of expressing 
the 4’s in terms of the parameter under estimate, 0. For some illustrations reference 
may be made to Kirstine Smith (1916). We shall not consider the method at length 
here for two reasons :— 

(a) it may be shown that for large samples the minimum- 7? estimator tends to 
the maximum-likelihood estimator ; 

(b) there is a modification of the method, considered below, which is much easier 
to apply. 


18.9. For samples of fixed size n the distribution of the quantities l; is multinomial, 
and we have for the likelihood function , 


; a! VI 
b= guy) 
I 


engl LN. uf ANS 
- suy (2) (y esos os (18.22) 
7 


log L = constant + X I tog (7): 3 f : . (18.23) 
J 


Thus 


Now for large samples we may put 
Àj — lj; +a, ni, 


where a, is finite and therefore small compared with l}; |a; n| <l; and Za) — 0. 


56 ESTIMATION: MISCELLANEOUS METHODS 
Hence, from (18.23), 
a; nè 
lgL-k + Zhlog(1 +4") 
j 


pulsis "ro -i) 


BO A z . (18.24) 
j 
Now write 
pic zo a 4)? 
j 
—E—m. 5 5 : Ü . (18.25) 


Then we see that, to order n-*, L is maximised by minimising 7'7. This latter quantity 
is not the same as 4* because the denominator terms are l’s il of 2s. However, for 
large n the difference is of order 2-3, for 


p-n-is-w[-i] 


ic E 


pie ay ni. 


=f "A 
Hence, to order n=? the estimates obtained by minimising either 7? or y’? will be equivalent 
to maximising L. 


18.10. The advantage of using y’? instead of 7? in practice resides in the fact that 
the denominators in the former are integral. However, if there are any empty cells (i.e. 
those for which 1; = 0) the formula (18.25) requires some modification. 


D 
In the likelihood function, if 1; = 0, (2) = l for all 4. The substitution 


Àj = lj +4; nt 
will give us, for the empty cells, a term in (18.24) equal to — Zant = — X} — M, 
say. Hence we have 
ya zh by * 42M, SO EA TENIS 


where the summation takes place over VERA cells and M is the sum of the theoretical 
frequencies 2 in the empty cells. 


Example 18.5 


As an example (Jeffreys, 1941) we consider a case where the maximum likelihood 
estimator is known, so that a comparison may be made with the result given by 
minimum 7'?. 

Col. (2) of the following table shows the frequency of women in the first class of Part II 


L4 
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of the Mathematical Tripos from 1910 to 1938 inclusive. Assuming that this distribution 
-o 9i 
follows the Poisson distribution - to estimate 0. 


Q) | 38) (3) (4) 
| Aj X. 
Number of | Frequency 
firsts, j | lj Š 
b= 1 0=15 0=2 0=1 0=15 0=2 
0 6 10-7 6:5 3:9 3-7 0-0 0-7 
1 8 10-7 9-7 7:9 0-9 0-4 0-0 
2 1l 53 7:3 T9 3-0 1-2 0-9 
3 3 18 3-6 5-2 0-5 0:1 1:6 
4 0 0:5 14 2-6 — — — 
5 1 0-1 0-4 10 0-8 0-4 0 
over 5 0 0-0 O-1 0-5 2M = 1-0 | 2M = 3-0 | 2M = 6-2 
TOTALS 29 9:9 51 9-4 
| 
The sample mean (asufficient estimator of 0) is in this case 44/29 = 1:52 with a standard 


error NE = 0-23. 
n 


To apply minimum g’? we have to express the theoretical frequencies in terms of 0. 
This results in an unmanageable equation if we then substitute in y’*. Instead we cal- 
culate the minimum by finding z’? for some trial values of 0 (in this case 1, 1-5 and 2) and 
then interpolating. 

The expectations A for the three selected values of 0 are shown in column (3) of the 
table and the corresponding x’? in column (4). It is found that, writing 0 = 1-5 + ¢, 
the values of y’? may be represented by the quadratic 

43 251 — 054 + 18:292. 
The minimum of this is given by ¢ = 0-01, and hence our estimate of 0 is 1-51, very close 
to the value of 1:52 given by the maximum likelihood estimator. 


18.11. On theoretical grounds there seems no reason to use minimum y? instead of 
maximum likelihood. The method has some practical value, however, where the maxi- 
mum likelihood equations are difficult to solve. We can usually follow the device of the 
example just given, find 7? or y’? for some trial values of the parameter, and approximate 
to the value which minimises 7? or 7/?. Whether this is easier than finding the maximum 
likelihood estimate in the same sort of way depends on the circumstances of the case, but 
it may well be so when the frequency function is a tabulated integral, so that expected 
frequencies for specified parameter-values can be readily obtained. 


18.12. In the manner of 17.39 we can estimate the loss of information occasioned 
by the use of minimum 7*. We have, for the minimum of 7°, 


@ (kL —A)? 
Em eS At) 
207 À : à 
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which reduces to 


D . (18.27) 
22 00 
Since ae tends to the constant value 2 for large samples, this is equivalent to the 
maximum likelihood equation 
zi M <6, MEET S. —. (1833) 
confirming that maximum likelihood and minimum 7? give the same results in the limit. 


Since 
P= 42 = 2310 2) 4: (L— 2)? 


the deviation of x from its mean is 
12 — 22 04 (lL — 2)* aa aoa 
42 8 a po 
the first term vanishing on summation. As in 17.39 we find the variance of this quantity 
ð log L 


is constant. We have 


within samples for which 30 


X? (22) 


eee 
eer) 
2 
A" 
A*[ — 
ix( 2) >y ( >) ab RE) 
a? 


var X k (l — 4)? = 2 X (332) — Ers (kà3) — 2 


i 


ji We find 


and on substituting k = $ 


giving the loss of information. 

As the sample size increases, this quantity remains finite. It is interesting to observe, 
however, that as the number of classes increases it also increases without limit, indicating 
that minimum g? breaks down for fine grouping. 


A 
“ Inverse” Probability 


18.13. According to Bayes’ theorem (7.24), if h (0) d0 is the prior probability of 0, 
the posterior probability is given by :: 


P(0|z....2)—L(m...,0)h(0)d0.  .  . (83) 


Tt is then easy to determine the ** most probable ” value of 6 by maximising L h (0) if we 
know h (6). The principles of inference with which we have been concerned up to the 
present do not require the notion of the probability of 0 and, even if they did, would not 
give any guide to the nature of the function A (0). In fact, to an adherent of the frequency 
theory of probability, the prior probability of 0 requires the distribution of 0 in some form, 
and if 0 is merely an unknown constant it has no distribution (except the trivial one that 
f =1 when 0 takes its true value and f = 0 elsewhere). The alternative school of thought 
assumes the existence of / (0) as denoting a prior measure of belief, but, in order to find 
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the most probable value of 0, has to make some further assumption as to its values com- 
parable to Bayes' postulate that for a finite range is a constant. 

We have already noted that on this assumption the maximisation of L is equivalent 
to finding the value of 0 with the greatest posterior probability. It is also interesting to 
note that, whatever the form of / (0), maximum likelihood tends to give the same estimator 
as the method of maximising posterior probability for large ». In fact, for the maximisation 
of P in (18.31) we have 

dlogP _ dlogL 
0 00 
jh ; 9logL. ANN 
In ordinary cases the variance of —30 3» of order », whereas the second term is inde- 
pendent of ». In the limit, therefore, the second term is negligible and we are reduced to 
the likelihood equation 


dlogh _ 


8.32 
25 o ERU MS go) 


+ 


alog L _ 


0 0. 


Least Squares 

18.14. The method of least squares bears an analogy to minimum y?. Suppose 
we have an expression depending on a number of unknown parameters 0, . . . 0, and 
certain observed values v. This can be thrown into a form such as 


k(20,...0,) =0, Me ei s SS) 


where 4 is a given function (not a frequency function). If we have n values of x and n > p 
it is not possible to solve the n resulting equations of type (18.33) for the 0's. We then 


consider the “ residuals " b (x; 0, . . . 0,), and the principle of least squares states that 
the values of 0, . . . 0, are to be chosen so that 
Z {k (aj, 0, . . . 0p) j* = minimum, $ x . (18.34) 
or, in other words, so as to satisfy the p equations 
2.9 (2,8, p aos o) 1-134 22 2 e na) 
j 00, 


18.15. Consider the case when the residuals are all distributed normally with variance 
o*. The logarithm of the likelihood is then (except for constants)— 
i ! " 
3i Zk*(m,0i...0,) . é . (18.36) 
and this is clearly maximised by minimising the sum (18.34). In this case, then, the method 
of least squares is equivalent to the method of maximum likelihood. In other cases it 
may give different results, and the justification for using it then becomes more or less 
empirical. 


log L = — nlogo — 


18.16. The most important case occurring in statistical theory of the use of the 
method of least squares concerns regression equations. We have already seen that the 
coefficients of regression are, in effect, determined so as to minimise the sum of squares of 
residuals (cf. 15.2). We also know that, for the multiple normal distribution, residuals 
from the population regression lines are, in fact, normally distributed (15.13). For normal 
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variation, therefore, the method of least squares is equivalent to maximum likelihood so 
far as concerns the simultaneous estimation of regression coefficients. 


18.17. This is a convenient point to prove a theorem (due to Gauss) which in one 
form or another is constantly occurring in statistical theory, particularly in connection 
with the normal distribution. Suppose we have a population (not necessarily normal) 
in which the regression of one variate y on the others 2 (—1), 2; .. . , 2, is given by 

Y=BotPitit... + Pp tp- : » . (18.37) 
The z's may be correlated among themselves and, in the extreme case, functionally related, 
80 that this case includes that of curvilinear regression for our present purposes. Suppose 
that we have a sample of n values, where n > p. Denoting by X summation over these 
n values, we determine the estimates of the f's by minimising the sum of squares, e.g. 
2 = (y. — By — Bia — =. « — By tp)? 
Suppose that b, . . . bp are the solutions of this process. Then our regression formula is 
y—b;—b;x;— ... —5,2, = 0. ; T . (18.38) 
The observed residuals, obtained by substituting the observed values in this equation, 
are typified by 


e=y —by— biti. + — by tp : : + (18.39) 
whereas the “real” residuals are typified by 
e = 9 — Bo Pity 9. Ry t, : : . (18.40) 
We proceed to compare the sampling variances of e and s and to show that 
n 
vare — Ss 1S €, . . . . (18.41) 
provided that the residuals are uncorrelated. 
Let us transform the observed values of the a’s to new values £j £, . .. Èp (n for 
each) such that 
Si) =l, jak . 
— jk FED Cos . (18.42) 
Z (E y) = b, 


This involves, for each £, p + 1 equations in n unknowns and is therefore possible in general. 
We then have 


—E&, (6 — e) = Z tr { (Bo — ba) + (Br —b)m-... (p — by) tp) 


= By, — br. 
But tee =Z (Ey) — E Er {bo bixi +... b,x} 
bP 
Hence By, — br =— LE, E. Š " 2 . (18.43) 


Now = Zele —e) =X fy —b — 3... 1b, a} (fub) + vss (By — bp) £p} 


> 
since the summations give terms the vanishing of which determines the b’s. Hence 


fe — Lei = JT (e e)s 
NOE ae 
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where S denotes summation over the (p + 1) values of j, 
=SLEeLuye 
= S (X & v; e?} + cross-product terms in e, 
= § e? + eross-product terms. 
When we take expectations the cross-product terms vanish since the residuals are uncorre- 
lated. Hence 
E (Xe?) —H (Se?) Eze, 
or (n — p — 1) vare = n var e, . ` . (18.44) 
from which (18.41) follows at once. 
For normal variation we shall consider this result from a slightly different viewpoint 
in Chapter 22. 


NOTES AND REFERENCES 
The approach to minimum-variance estimators through the Calculus of Variations is 
due to Aitken and Silverstone (1942). For minimum x? see K. Smith (1916) and R. A. 
Fisher (1922a, 1925b). For the modification y’? see Jeffreys (19385, 1939b, 1941). 
A method of estimation essentially depending on the median has been proposed for 
use in quality control, but its value is as yet problematical. For an account of the technique 
see Simon (1941). 


EXERCISES 
18.1. From the property that the variance of a minimum-variance estimator is 
equal to 4 show that the most general distribution for which the sample mean is a sufficient 
estimator is 


$ 20? 
where c is an arbitrary function and g? is the variance of f. 
Hence show that no Pearson curve other than the normal admits the sample-mean 
as a sufficient estimator, but that a Gram-Charlier series may do so. 


feo =c 0) exp [— gj, (e — 0), 


- (Aitken and Silverstone, 1942.) 
18.2. If the function 4 exists and 
A E TUR 
«A (0) 
show that the variance of the estimator ¢ is 
aoig 
n 0a" 
where q is the function of 18.7. (Aitken and Silverstone, 1942.) 


18.3. Ifa population (p + q)* is regarded as distributed in 5 classes, show that the 


intrinsic accuracy is ES Show further that the loss of information through estimating 


p from minimum 7? is 
(p= 49)? (4 = 
Pe (3p? — 2pq + 34°) “Spi q? (p* — 2p*q + 18p? g? — 2pg? + q*y. 
This is least when p = q and is then equivalent to the loss of 5 observations. 
(Fisher, 19255.) 


CHAPTER 19 
CONFIDENCE INTERVALS 


19.1. In the previous two chapters we have been concerned with methods which 
will provide an estimate of the value of one or more unknown parameters ; and the methods 
gave functions of the sample values—the estimators—which, for any given sample, pro- 
vided a unique estimate. It was of course fully recognised that the estimate might differ 
from the parameter in any particular case, and hence that there was a margin of uncer- 
tainty. The extent of this uncertainty was expressed in terms of the sampling variance 
of the estimator. With the somewhat intuitional approach which has served our purpose 
up to this point, we say that it is probable that 0 lies in the range t + y var t, very probable 
that it lies in the range ¢ + 24/ var t, and so on. In short, what we have done is in effect 
to locate 0 in a range and not at a particular point, although we have regarded one point 

. in the range, viz. t itself, as having a claim to be considered as the ‘ best ” estimate of 0. 


19.2. In the present chapter we shall examine the logic of this procedure more 
„Closely and look at the problem of estimation from a different point of view. We now 
abandon attempts to estimate 0 by a function which, for a specified sample, gives a unique 
number. Instead we shall consider merely the specification of a range in which 6 lies. 
- We shall not attempt to specify whereabouts in the interval the value of 6 really is; all 
values in the range have an equal claim to be taken as the * true" value. Nor shall we 
assess the probability that 0 lies in the interval in the sense that 0 is regarded as a random 

* variable. In fact, in the frequency theory of probability 0 is not a random variable (except 
trivially in that the frequency of 0 is unity when it takes the true value and is zero else- 
where) Nevertheless, probability plays an essential part in the determination of the 

interval and in the degree of confidence we have that it ‘ covers” 0. 


Case of one Unknown Parameter 


19.3. Consider in the first place a population dependent on a single unknown para- 
meter 0 and suppose that we are given a random sample of n values 2, . . . x, from the 
population. Let z be a statistic dependent on the z's and on 0, whose sampling distribution 
is independent of 0. (The examples given below will show that in some cases at least such 
a statistic may be found.) Then, given any probability «, we can find a value z, such that 


[5 aF (2) =, 


and this is true whatever the value of 0. In the notation of the theory of probability we 
shall then have 

Petree IS Cii et ve. NN . (19.1) 
Now it may happen that the inequality z < z, can be transformed to the form 6 <t, or 
6 > ty, where f, is some function depending on the value z, and the z's but not on 0. For 
instance, if z = z — 0 we shall have 


and hence 02i-—z. 
If this transformation can be made we then have, from (19.1), 
POSEI) Se . . ; . > (19.2) 
62 
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More generally, suppose that we can find a function f,, depending on g and the 2's 
but not on 6, such that (19.2) is true for all Ø, Then we may use this equation in probability 
to make certain statements about 6, 


19.4. Note, in the first place, that we cannot assert that the probability is « that 
0 does not exceed a constant t. This statement (in the frequency theory of probability) 
can only relate to the variation of 0 in a population of 0's, and in general we do not know 
that 0 varies at all. If it is merely an unknown constant then the probability that 0 < t, 
is either unity or zero. We do not know which of these values is correct, but we do know 
that one of them is correct. 

We therefore look at the matter in another way. Although 0 is not a random variable, 
t, is and will vary from sample to sample. Consequently, if we assert that 0 < t, in each 
case presented for decision, we shall be right in a proportion « of the cases in the long run. 
The statement that the probability of 0 is less than or equal to some assigned value 
has no meaning except in the trivial sense already mentioned ; but the statement that . 
a statistic ¢, is greater than or equal to 0 (whatever 0 happens to be) has a definite proba- ~ 
bility « of being correct. If therefore we make it a rule to assert, the inequality 0 < t, 
for any sample values which arise, we have the assurance of being right in a proportion - 
« of the cases “on the average” or “in the long run." 

This idea is basic to the theory of confidence intervals which we proceed to develop, 
and the reader should satisfy himself that he has grasped it. r 


19.5. To simplify the exposition we have considered only a single quantity t, and 
the statement that 0 <¢t,. In practice, however, we usually seek for two quantities t, * 
and &, such that 
P{t <0 <4 |O} =a, . ‘ » . . (19.3) 
and make the assertion that 0 lies in the range £ to t,. These quantities are known as the 
Lower and Upper Confidence Limits respectively. They depend only on « and the sample 
values. For any fixed « the totality of values of t, and t, for different samples determine 
a field within which 0 is asserted to lie. This field is called the Confidence Belt or Region 
of Acceptance. We shall give a graphical representation of the idea below, The number 
« is called the Confidence Coefficient. 


Example 19.1 
Suppose we have a sample of n from the normal population with unit variance 


1 


dF = gq) P {— 4 (x — u)*) dz, — o «z «c. 

The distribution of means # will be 
ay Ro P eto = à 
dF Ji ow zë p az, o «z«o 


From the tables of the normal integral we know that the probability of a positive deviation 
from the mean not greater than twice the standard deviation is 0:97725. We have 


then— 


2 = 
P[1-n «uu == 0:97725, 
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which is equivalent to 
2 
2E n «cu D = 0:97725. 


Thus, if we assert that is greater than or equal to z — 2//n we shall be right in about 
97-725 per cent. of the cases. 
Similarly we have 


2 2 

E; — E = Reed = 0:97725. 

pla H> E Pfu «d —. Lu} 0:97725 
Hence, combining the two results, 

2 2 
ie ues = 2 (0.97725) — 1 = 0-9545. 
2E Wis «2 D. lu) 2 (0-97725) 0:9545 
Hence, if we assert that y. lies in the range Z + 2//n we shall be right in about 95-45 per 

cent. of the cases in the long run. 


Conversely, given the confidence coefficient we can easily find from the tables of the 


normal integral the deviation d such that P E — E <p <+ A — «. Forinstance, 
if a = 0:8, d = 1:28, so that if we assert that u lies in the range € + 1-28/4/n the odds 
are 4 to 1 that we shall be right. 

The reader to whom this approach is new will probably ask : but is this not a round- 
about way of using the standard error to set limits to an estimate of the mean? Ina 
way, it is. In effect, what we have done in this example is to show how the use of the 
standard error of the mean in normal samples may be justified on logical grounds without 
appeal to new principles of inference other than those incorporated in the theory of proba- 
bility itself. In particular we make no use of Bayes’ postulate. 

‘Another point of interest in this example is that the upper and lower confidence limits 
derived above are equidistant from the mean à. This is not by any means necessary, 
and it is easy to see that we can derive any number of alternative limits for the same con- 
fidence coefficient «. Suppose, for instance, we take « = 0:9545, and select two numbers 
«, and æ, which obey the condition 


(a +% — 1) = 0:9545, 
say % = 0:9645 and æ, = 0:99. From the tables of the normal integral we have 


Bp aes — 099 
Loue ssec ais 


1-806 
i— = = 0 5 
pla > i 5 oen, 


and hence 


_ 2:326 . . 1-806 2 
pia Sa <p <+ s n = 0-9545. 


Thus, with the same confidence coefficient we can assert that lies in the range Z — 2//n 
to Z + 2/+/n, or in the range g — 2:326/4/n to € + 1-806/4/n. In either case we shall be 
right in about 95-45 per cent. of the cases. 

We note that in the first case the range is 4/4/n units and in the second case it is 
4:132/^/n units. Other things being equal, we should choose the first set of limits since 
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fhey locate the parameter in a narrower range. We shall consider this point in more 
detail below. It does not always happen that there is an infinity of possible confidence 
limits or, if there is, that any simple rule of choice between them can be formulated. 


Graphical Representation 

19.6. Ina number of simple cases, including that of the previous example, the con- 
fidence limits can be represented in a useful graphical form. We take two orthogonal 
axes, OX relating to the observed z and OY to u (see Fig. 19.1). 


Fic. 19.1. 


The two straight lines shown have as their equations 
u-s-42, nu 4-2. 

Consequently, for any point between the lines, 

$—2«p «492. 
Hence, if for any observed @ we read off the two ordinates on the lines corresponding to 
that value we obtain the two confidence limits. The vertical interval between the limits 
is the confidence range (shown in the diagram for # = 1), and the total zone between the 
lines is the confidence belt. We may refer to the two lines as the Upper and Lower 


Confidence lines respectively. 

This example relates to the somewhat trivial case n = 1. For different values of n 
there will be different confidence lines, all parallel to y = z. They may be shown on a 
single diagram for selected values of n, and a figure so constructed provides a useful method 
of reading off confidence limits in practical work. 
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Central and Non-central Intervals 
19.7. In Example 19.1 the sampling distribution on which the confidence intervals 
were based was symmetrical, and hence, by taking equal deviations from the mean, we 
reached equal areas of the frequency function as z, and «,. In general we cannot achieve 
this result with equal deviations, and subject always to the condition «y + &, — 1 = « 
the two quantities may be chosen arbitrarily. 
If a» and a, are taken to be equal, we shall say that the intervals are central. In such 
a case we have 
TẸ 
2 
In the contrary case the intervals will be called non-central. 


Pie <0) =P (0 <t) = P se ^ " . (19.4) 


19.8. In the absence of other considerations it is usually convenient to employ 
central intervals, but circumstances sometimes arise in which non-central intervals are 
more serviceable. Suppose, for instance, we are estimating the proportion of some drug 
in a medicinal preparation and the drug is toxic in large doses. We must then clearly 
err on the safe side, an excess of the true value over our estimate being more serious than 
a deficiency. In such a case we might prefer to take «; very near to unity or even equal 
to unity, so that 

P(0 <t)=1 
Pts « 0) c; 
and we are certain that 0 is not greater than t. 

Again, if we are estimating the proportion of viable seed in a sample of material that 
is to be placed on the market, we are more concerned with the accuracy of the lower limit 
than that of the upper limit, for a deficiency of germination is more serious than an excess 
from the grower's point of view. In such circumstances we should probably take o, as 
large as conveniently possible so as to be nearer to certainty about the minimum value 
of viability. This kind of situation often arises in the specification of the quality of a 
manufactured product, the seller wishing to guarantee a minimum standard but being 
much less concerned with whether his product exceeds expectation. 


19.9. On a somewhat similar point, it may be-remarked that in certain circum- 
stances it is enough to know that P {te <0 <t, |0} exceeds some quantity æ. We then 
know that in asserting 0 to lie in the range f, to t, we shall be right in at least a proportion 
a of the cases. Mathematical difficulties in ascertaining confidence limits exactly for 
given «, or theoretical difficulties when the distribution is discontinuous may, for example, 
lead us to be content with the inequality rather than the equality of (19.3). 


Example 19.2 


To find confidence intervals for the parent proportion w of successes in sampling for 
attributes. 

In samples of n the distribution of successes is given by the binomial (y + a)". We 
will determine the limits for the case n = 20 and confidence coefficient 0-95. : 
. We require in the first instance the distribution function of the binomial, which is 
obtainable from Table 5.2 (vol. I, p. 119). Summing the number of successes and dividing 
by 10,000, we find from that table the following :— 
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Proportion of 
Successes oO -—01 O= 02 U-—03 w= 0-4 U= 05 
P 

0-00 0-1216 0:0115 0-0008 = =: 
0-05 0-3918 0-0691 0-0076 0-0005 — 
0-10 0-6770 0-2060 0-0354 0:0036 0:0002 
0-15 0-8671 0-4114 0-1070 0:0159 0:0013 
0-20 0-9569 0:6296 0-2374 0:0509 0:0059 
0-25 0:9888 0-8042 0-4163 0-1255 0:0207 
0:30 0:9977 0-9133 0-6079 0:2499 0:0577 
0:35 0-9997 | 0:9678 0:7722 0:4158 0:1316 
0:40 1-0001 0:9900 0-8866 0:5955 0-2517 
0:45 1:0002 0:9974 0-9520 0:7552 0:4119 
0-50 — 0-9994 0-9828 0:8723 0:5881 
0:55 — 0-9999 0-9948 0-9433 À 0:7483 
0-60 — 1-0000 0:9987 0:9788 0:8684 
0-65 — — 0-9997 0-9934 0-9423 
0-70 =- — 0-9999 0:9983 0:9793 
0:75 — — — 0-9996 0-9941 
0-80 — | — — 0-9999 0:9987 
0:85 — | — — — 0:9998 
0-90 -— — — — 1:0000 
0°95 — — — — = 


The final figures may be a unit or two in error owing to rounding up, but that need 
not bother us to the degree of approximation here considered. Values for w = 0-6 to 0-9 
may be obtained by symmetry. - 

We note in the first place that the variate p is discontinuous. On the other hand 
we are prepared to consider any value of w in the range 0 to 1. For given w we cannot 
in general find limits to p for which « is exactly 0-95; but we will take p to be the nearest 
multiple of 0-05 which gives confidence coefficients at least equal to 0:95, so as to be on 
the safe side. We will consider only central intervals, so that for given w we have to find 
Pa and p, such that 

P (o > po} > 0975 
P (o < pı} > 0:975, 


the inequalities for P being as near to equality as we can make them. 

Consider the diagrammatie representation of the type shown in Fig. 19.1 and given 
for our present case in Fig. 19.2. 

From the table we can find, for any assigned w, the values c, and a, such that 
P (p > c)» 0:975and P (p < wi)? 0:975. Note that in determining c, the distribution 
function gives the probability of obtaining a proportion p or less successes, so that the 
complement of the function gives the probability of a proportion 1 — p — 0:05 or less 
(not 1 — p). Here, for example, on the horizontal through w = 0-1 we find w, = 0 and 
w, = 0:30 from our table; and for w = 0-4 we have w, = 0-15 and w, = 0-65. The points 
so obtained lie on stepped curves which have been drawn in. The zone between them is 
the confidence belt. For any p the probability that we shall be wrong in locating w inside 
the belt is at the most 0-05. We determine p, and p, by drawing a vertical at the given 
value of p on the abscissa and reading off the values where it intersects the curves. That 
these are, in fact, the required limits will be shown in a moment. 
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We could have found more precise confidence limits by interpolating in the table 

obtained above. For example, with p = 0-30 we see that 

for w = 0-1, P = 0:9977 

for o = 0-2, P = 0:9133. 
Hence, for P — 0-975 we have approximately 

9977 — 9750 
14 -1) = 0-127, 
@ = 01 + app 9133 0 D) 

and closer approximations can be obtained if desired. The corresponding point on the 


Values 
of 


Values of b 
Fic. 19.2. 


lower confidence line to w, = 0-127 is p = 0-35. Calculations on these lines give us the 
values of o such that 

P {po <o €p) = « exactly, - 
whereas the former approach gave values such that 

P {pa <@ «pi) = « approximately, 
2 «in any case. 

Discontinuous variates usually give rise to this sort of arithmetical nuisance, but the 
approximation in practice is sufficiently good, except for very small samples. The broken 
curves in Fig. 19.2 give the more precise limits. They lie, of course, inside the more 
approximate step-curves. 

It is, perhaps, worth noticing that the points on the curves of Fig. 19.2 were constructed 
by selecting an ordinate œ and then finding the corresponding abscissae c, and w,. The 
diagram is, so to speak, constructed horizontally. In applying it, however, we read it 
vertically, that is to say, with observed abscissa p we read off two values Po and p, and 
assert that po <w «p, It is instructive to observe how this change of viewpoint can 
be justified without reference to Bayes' postulate. 
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Consider Fig. 19.3, which shows a pair of confidence lines for the binomial. Let w’ 
be a given value of w and let the horizontal through w' meet the confidence lines in points 
with abscissae w, and w, Then we know that in repeated samples from a population 
with parameter w a proportion « will give observed values of p lying between m, and o; ; 
for the curves were constructed so that this should be so. 

Now since the horizontal at a’ lies entirely within the confidence belt for w <p < m, 
(and does so for any o), it follows that the assertion that a’ lies in the belt is correct if, 


Values 
of 7 
c [0] 


Values of p 
Fic. 19.3. 


and only if, p lies between m and w,, that is in a proportion « of the cases. This, being 
true for any a’, is true for all o, irrespective of the relative frequency of occurrence of the 
ws under estimate. Consequently our assertion that c lies in the confidence belt is correct 
in a proportion « of the cases; and, in particular, for any observed p we may assert that 
w lies within the ordinates determined on the two curves by the vertical through p. 


Confidence Intervals for Large Samples 
19.10. In our usual notation, the logarithm of the likelihood function gives 


log L = > log f (e, 0), ELE cr (10.5) 


j=l 


alog L  ,90logf 19.0 
and EUR 30^ . . . , . (19.6) 
We may regard 2 PEL as a random variable, and in particular write— 
ð log L 
nA = ar ( 30 ) 
21 
so that A= var ( 1). p ; 5 1 ZO (1070) 
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Write (19.8) 


"Then, for large samples, y will be distributed normally in the limit with unit variance, in 
virtue of the Central Limit Theorem, under very general conditions. It will also have 


Zero mean, since 
90 log f lof 
z(a) EG) 


=—.1=0. inte Me . (19.9) 


Hence, from the distribution of y we may easily determine confidence limits for 0 in large 
samples if y is a monotonic function of 0, so that inequalities in one may be transformed to 
inequalities in the other. 


It is sufficient (but not necessary) for the existence of the normal limit to y that 2 
exists for all z, except perhaps at isolated points, that the range is independent of 0 and 


that the Central Limit Theorem applies (e.g. if the third moment of aka exists). We 
also assume, as usual, that differentiation under the integral sign, as in (19.9), is legitimate. 
Example 19.3 
Consider again the problem of i 19.1. We have, with u for 0, 
E ex x — 
fes m) =a exp (d 6 = 14} 
9lgf | b 
ð log f 
var ( ou )- ve! (x — u)? f dx 
Hence P» e 5- = n 
ae IOI cd QR 


is normally distributed with unit variance for large n. (We know, of course, that this 
is true for small v as well in this particular case.) The confidence limits may then be set 
as in Example 19.1. 


Example 19.4 
Consider the Poisson distribution whose general term is 


fe 2f 
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We have 
8lgf x | 
pm 
dlogf ell A 2 
var ( ài ) 20 1) e ES 
m 
m 
i20 -^ 2 
A A A Ne (Riu A. 
Hence e OE alee ) 


For example, with « = 0:95, corresponding to a normal deviate + 1-96, we have, for the 
central confidence limits, à 
È n 
—4) A= E T96, 
&-2 fea 


æ- (2z ye moo 


1-5 JT) 
n n n 


the ambiguity in the square root giving upper and lower limsts respectively. 
To order »-* this is equivalent to 


A=@+ 196, [7 
n 


from which the upper and lower limits are seen to be equidistant from the mean @, as we 
should expect. 


giving, on solution for 2, 


Shortest Sets of Confidence Intervals 

19.11. It has been seen in Example 19.1 that in some circumstances at least there 
exist more than one set of confidence intervals, and it is now necessary to consider whether 
any particular set can be regarded as better than the others in any useful sense. The 
problem is analogous to that of estimators, where we found that in general there are many 
different estimators for a parameter, but that we could sometimes find one (such as that 
with -minimum variance) which was superior to the rest. 

In Example 19.1 the problem presented itself in rather a specialised form. We found 
that for the intervals based on the mean $ there were infinitely many sets of intervals 
according to the way in which we selected a, and «, (subject to the condition that 
do + a = 1 +a). Among these the central intervals are obviously the shortest, for a 
given range will include the greatest area of the normal curve if it is centred at the mean 
of the curve. We might reasonably say that the central intervals are the best among 
those determined by #. 

But it does not follow that they are the shortest of all possible intervals, or even that 
such a shortest set exists. It might also happen that for two sets of intervals c, and c, 
those of c, are shorter than those of c, in part of the range of x’s and longer in other parts. 
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19.12. We will therefore consider sets of intervals which are shortest on the average. 
That is to say, if 
à — 1, — t 
we require that 


[oar = minimum, 2? SR) 


- where the integral is taken over all z's and is therefore equivalent to 


ju wes ELLEN ee, (19.11) 


We now prove a theorem which is very similar to the result that maximum-likelihood 
estimators in the limit have minimum variance, namely that in a certain class of intervals 
the method of 19.10 gives those which are shortest on the average. 

Let h (x, 0) be a function which has a zero mean value and is such that the sum of 
a number of similar functions obeys the Central Limit Theorem. Then 


D hlen 0) 
= ' 
c= GAGE 4 4 " . + (19.12) 


^is normally distributed in the limit with zero mean and unit variance. y of equation 
(19.8) is a member of the class 4. We prove that the average rate of change of y with 
respect to 0, for each fixed 0, is greater than that of any ¢ except in the trivial case 


..,9log f 
jp $8 ^ 
Wage. 6). ce , we have 
dp — 1 ag — 1 ð var g 
90 CERT 00 Tee 90 = a RU 
oc 1 oh 1 0 var h 
00 vec var a } eae 
Hence : ; 
Oy 1 0g 1 9 varg 
x E a 
E (SP) EMI (3) Sape 7 }. 
Now E (g) =0 and 
ag\ — @logf\ — alog f \? » 
2c aa aH 
-—E(g). 
Thus 
:(3)- - nE (g?) 
00 A/(n var g) 
=—V(nvarg)=4, say. . © . — . (1915) 
Similarly, 


at nci oh 
2 (5) = Van? (3) 4» say. .  . — . (19.16) 


' 
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Sinee E (h) — 0 we have 


ON (Oh af 
x (S) of dz [age 


= — Cov (h, g). . 5 * ; ` CLO SLT) 


Hence 


E 
A? — A} = nvarg — 


2 (f, 
sar ov 05 g) 


= —, {var h var g — cov? (h, g) }. : . (19.18) 


Thus, unless is a multiple of g, we have 
Ai 43, 
which was to be proved. 
Now if y, is a value such that 
1 i ee 

——| e du = ta, 

vx) J, 
the upper and lower confidence points for central intervals are + y, and the values of 0 
are the solutions of 


sue. = Aah T ees RITE ee . (19.19) 
say fọ and ¢,. Similarly those for any function A are given by 
D EBERT Y : 5 - . (19.20) 
say us and u, The equations for confidence points are equivalent to 
y (t) = E Ya 
b (u) = + Ye 


or, effectively, in large samples, by 
y (Oo) + (t — 0o) 


SS 
Se 
Se 
Se 
c 
y 
H 
"a 
5 


£(0) + (u — 9) (3), a 


where 0, is a fixed value of 0. When t = 0, and u = 0, we have y (05) = ¢ (00). Hence 


(t 0) (3), - (3), - EGET 3119.23) 


a ae 
Now we have just shown that, on the average, r1 > em Hence, on the average, 

t —0,-— u — bo, 
and the confidence limits ¢ are closer together than those of any member of the class u for 


any fixed value of 0. 


19.13. A comparison of the result we have just proved and the properties of maxi- 
mum likelihood estimators in the limit will show the close relation between confidence 
intervals and the theory of estimation developed in Chapter 17. In 17.27 we showed, 
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by considering the quantity u = E L that any estimator ¢ which is in the limit 
distributed normally about the true value 0, cannot have a variance less than 
alog f Y. 
1/n E ( 20 ) H 


and that the latter quantity, in the limit, is the variance of the maximum likelihood esti- 
mator. It attains the minimal value when v is constant over samples for which t is constant. 

The theorem of 19.12 shows that on the average the intervals determined by the 
distribution of u are shorter than those based on any other function with a zero mean value 
(obeying the usual conditions as to continuity, etc.) Since the maximum likelihood 
estimator has minimum variance, we should expect that confidence intervals based on its 
distribution would be shorter than others ; and this we now see to beso. For if u is constant 
over samples of constant t, the distribution of w in all samples is equivalent to that of t. 


Confidence Intervals and Sufficient Estimators 
19.14. Pursuing this line of thought, we are led to inquire whether sufficient esti- 
mators provide confidence intervals for finite samples and whether they have any minimal 
properties of the kind we have just established for large samples. 
Tt is easy to see that sufficient estimators do in fact provide confidence intervals. 
If t is sufficient for 0, the likelihood function may be put in the form 
L = fi (t, O) fs (a <- + zp) A T 6 . (19.22) 
and the distribution of ¢ and 0 is 
4 dF — f, (t, 0) dt. : ; : . . (19.23) 
Given « we can then find f, and ¢, such that F (tẹ, 0) = 1 — a and F (tu 0) = «, and solve 
for 0 in terms of f, and a, or t, and o, as the case may be. This process will provide the 


inequalities of the type we require, a proposition which we shall prove formally below 
(19.25). 


Example 19.5 
In Example 17.8 we saw that 


ja 
p 
is sufficient for 0 in the distribution 
gP-1 g-2z/o 
Frey A 0«zr«o, pl, 


where p is regarded as known. The distribution of Ó is in fact 


dup ap 
ar = (%) 
np, 


The distribution function of m = 7g 38 the incomplete J-function 


Im (np) ; omo £ 
T (np) Can ~p 1), 


T (np) db. 


- 
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We then find the values of m corresponding to «, and «, from the tables, and have 
P (m <mo) = % 
P (m> mi) = O, 

whence 


19.15. The position in regard to minimal properties of confidence intervals based 
on sufficient estimators remains somewhat obscure, but one would expect some such proper- 
g zs L is constant for constant ¢ when ż is sufficient, 


ties to hold even for finite n. Since u = 


the variance of u will be a function of the variance of t. This, however, is not necessarily 
enough to establish the fact that the corresponding confidence intervals are shortest on the 
average. Itisimaginable that the confidence intervals derived from its distribution might 
be longer on the average than those of some other system. This seems rather unlikely, 
at least for the ordinary distributions of statistical theory, but apparently no proof has 
been given. 


19.16. Neyman (19375) has proposed to apply the phrase ‘shortest confidence 
intervals " to sets of intervals defined in quite a different way. As it does not appear 
that such intervals are necessarily the shortest in the sense of possessing the least length, 
even on the average, we shall attempt to avoid confusion by calling them “ most selective.” 

Consider a set of intervals co, typified by 6, obeying the condition that 


jos isa a Muss ROL ir ni Tr Semen 1024) 


where we write à, c 0 —that is, 6, “ contains " 0—for the more usual tẹ < 0 < t, (t, — to = ôo). 
Let c, be some other set typified by 6, such that 


XC KE $m eue 


Either set is a permissible set of intervals, as the probability is « in both cases that the 


range 6 contains 0. 
If now for every c, we have, for any value 0' other than the true value, 


Pí(ó,c0'|0) «P (0c0'|0),  . 3 . (19.26) 
c, is said to be most selective. 


19.17. The ideas underlying this definition will be clearer from a reading of Chapters 
26 and 27 dealing with the Neyman-Pearson theory of inference. We anticipate them here 
to the extent of remarking that the object of most selective intervals is to cover the true 
value with assigned probability «, but to cover other values as little as possible. We may 
say of both c, and c, that the assertion ó c 0 is true in proportion « of the cases. What 
marks out c, for choice as the most selective set is that it covers false values less frequently 
than the remaining sets. 

The difference between this approach and the one leading to shortest intervals is that 
the latter is concerned only with the narrowness of the confidence interval, whereas the 
former gives weight to the frequency with which alternative values of 0 are covered. One 
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concentrates on locating 0 with the smallest margin of error ; the other takes into account 
-the desirability of excluding so far as possible false values of 0 from the interval, so that 
mistakes of taking the wrong value are minimised. 


19.18. Neyman himself has shown that most selective sets do not usually exist (for 
instance, if the distribution is continuous) and has proposed two alternative systems :— 


(a) most selective one-sided systems (Neyman’s “ shortest one-sided " sets) which 
obey (19.26) only for values of 6’ — 6 which are always positive or always negative ; 

(b) selective unbiassed systems (Neyman's *' short unbiassed " sets) which obey 
(19.25) but, in place of (19.26), the further relation 


P(àe0|0)-«2P(àc0]U). . . . .Q92) 


Tn essence these sets amount to a translation into terms of confidence intervals of 
certain ideas in the theory of tests of significance, and we may defer consideration of them 
until Chapters 26 and 27 are reached. 


Generalisation to the Case of Several Parameters 


19.19. We now proceed to generalise the foregoing theory to the case of several 
parameters. Although, to simplify the exposition, we shall deal in detail only with a single 
variate, the theory is quite general. We begin by extending our notation and introducing 
a geometrical terminology which may be regarded as an elaboration of the diagrams of 
Figs. 19.1 and 19.2. 

Suppose we have a frequency function of known form depending on / unknown para- 
meters, 0, . . «0n and denoted by f(x, 0, . . . 0). We may require to estimate either 
0, only or several of the 0's simultaneously. In the first place we consider only the estima- 
tion of a single parameter. To determine confidence limits we require to find two functions 
uo and u, dependent on the sample values but not on the 0's, such that 


P {us <0: €u|06...0) =a, . i ; . (19.28) 
where « is the confidence coefficient chosen in advance. 
With a sample of n values, z, . . . 2,, we can associate a point in an n-dimensional 


Euclidean space, and the frequency-distribution will determine a density function for 
each such point. The quantities u, and u,, being functions of the zs, are determined in 
this space, and for any given « will lie on two hypersurfaces (the natural extension of the 
confidence lines of Fig. 19.1). Between them will lie a Confidence Zone or Region of 
Acceptance. 

In general we also have to consider a range of values of 0 which are a priori possible. 
There will thus be an /-dimensional space of 6’s subjoined to the n-space, the total region 
of variation having (l + n) dimensions; but if we are considering the estimation of 0, 
this reduces to an (n + 1)-space, the other (| — 1) parameters not appearing as variables. 

We shall call the sample-space W and denote a point whose co-ordinates are z, . . . x, 
by E. We may then write uo (E), u, (E) to show that the confidence functions depend 
on E. The interval u, (E) — us (E) we denote by ô (E) or ò, and as above we write Ó c 6, 
to denote us «0, <u, The region of acceptance or confidence zone we denote by A, 
and may write Æ eô or E s A to indicate that the sample-point lies in the interval ô or 
the region A. 
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19.20. In Fig. 19.4 we have shown two axes x, and x, and a third axis corresponding 
to the variation of 0,. The sample-space W is thus two-dimensional. For any given 
0,, say 0, the space W is a hyperplane (or part of it), one such: being shown. 


Q 


Fro. 19.4. 


Take any given pair of values (xı, v.) and draw through the point so defined a line 
parallel to the 0,-axis, such as PQ in the figure, cutting the hyperplane at R. The two 
values of u, and u, will give two limits to 0, corresponding to two points on this line, say 
U, V. Consider now the lines PQ as z;, x, vary. In some cases U, V will lie on opposite 
sides of R, and 0; lies inside the interval UV. In other cases (as for instance in U'V’ shown 
in the figure) the contrary is true. The totality of points in the former category deter- 
mines the region of acceptance A, shaded in the figure. If for any point in A we assert 
ò c 01, we shall be right; if we assert it for points outside A we shall be wrong. 


19.21. Evidently, if the sample-point E falls in the region A, the corresponding 
0; lies in the confidence interval and conversely. It follows that the probability of any 
fixed 0, lying in the confidence interval is the probability that E lies in A (0;); or in 


symbols— 


P{5c0,|0..-- 06] =P {uo <0 <m|%-. - 6) 
=P{HeA(6;)|O..-- 6). . (19.29) 


From this it follows that if the confidence functions are determined so that 
P(u,«0, <u |01--- 0) =a 


we shall have, for all 6, 
P(E&A(0)|0... 0j) =a. o 3 5 . (19.30) 


Tt follows also that for no 0, can the region A be empty, for if it were the probability in 
(19.30) would be zero. 
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19.22. If the functions u, and u, are single-valued and determined for all Æ, then 
any sample-point will fall into at least one region of acceptance. For on the line PQ cor- 
responding to the given # we take an R between U and V, and this will define a value of 
0,, say 0;, such that E e A (0,). , z 

More importantly, if a sample-point falls in the regions A (6,) and A (01) correspond- 
ing to two values of 0, 0; and 0j, it will fall in the region A (01), where 0? is any value 
between 0, and 0%.. For we have 

" dy «0; «uu wu SO; «us 
and hence us «0; <0; «wu, 
if 0; is the greater, and hence 
uw, «0; «0, «0, «u, 
or Ue < [A «s. 

Further, if a sample-point falls in any of the regions A (0,) for the range of 0-values 

f, <0, — 0%, it must also fall within A (0) and A (01). 


19.23. The conditions referred to in the two previous sections are necessary. We 
now prove that they are sufficient, that is to say : if for each value of 0, there is defined 
in the sample-space W a region A such that 

4 (1) P(E & A (0,) | 0) = «, whatever the value of the 0's ; 

(2) For any E there is at least one 0,, say 0,, such that E & A (01) ; 

(3) If E e A (0;) and E e A (6;), then E e A (0%) for any 0; between 0; and 6; ; 

(4) If E e A (0,) for any 0, satisfying 6; <0, < 01, He A(0;) and E e A (01) ; 
then uş and u, viz. confidence limits for 0, are given by taking the lower and upper bounds 
of values of 0, for which a fixed sample-point falls within A (0,). They are determinate 
and single-valued for all E, uo <a, and P (u, <0, <u |0,) = « for all 0). 

The lower and upper bounds exist in virtue of condition (2), and the lower is not greater 
than the upper. We have then merely to show that P {uo <0, <u, |01} = «, and for 
this it is sufficient, in virtue of condition (1), to show that 

P{u <0, <u |01} —P(EsA(0)|0). . 3 . (19.31) 
We already know that if E e A (0,) then uw <6, < u,; and our result will be established 
if we demonstrate the converse. 

Suppose it is not true that when uo < 0, < u,, E £ A (0). Let E' be a point outside 
A (0,) for which us <0, <u, Then must either uo = 0, or u, = 0, or both; for other- 
wise 4; and u, being the bounds of the values of 0, for which Æ lies in A (0), there would 
exist values 0; and 0j, such that E e A (0;) and E e A (01) and 


Wo «0, —0, <6; «A, 


so that, from condition (3), E e A (0) which is contrary to assumption. 

Thus 4, = 0, or u, = 0, or both. If both, then E must fall in A (0,), for u, and u, 
are the bounds of §-values for which this is so, and if they coincide their common value 
must be so. Finally, if uw = 0, < u, (and similarly if u, < 0, = w;) we see that for 
up — 0, <u, E must fall in A (0;) from condition (3), and hence, from condition (4), E 
must fall in A (0;) and A (07) where 0; = u, and 6; =u, Hence it falls in A (0,). 


19.24. The foregoing theorem gives us a formal solution of the problem of finding 
confidence intervals in the general case, but it does not provide a method of finding the 
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intervals in particular instances. In practice we have three lines of approach: (1) to use 
sufficient estimators, (2) to adopt the process known as “ studentisation," and (3) to 
“ guess " a set of intervals in the light of general knowledge and experience and to verify 
that they do or do not satisfy the required conditions. 


19.25. Consider the use of sufficient estimators in the general case. Tf t, is sufficient 

for 0, we have 
SIR Pl MIATC ES BE. SS 5 . (19.32) 
The locus £, = constant determines a series of hypersurfaces in the sample-space W. If 
we regard these hypersurfaces as determining regions in W, then f, < k, say, determines 
a fixed region K. The probability that E falls in K is then clearly dependent only on 
1, and 0,. By appropriate choice of k we can determine K so that 
P(EcK|0) =a, 

and hence set up regions of acceptance based on values of t. We can do so, moreover, 
in an infinity of ways, according to the values selected for a» and «. 


Studentisation 


19.26. In Example 19.1 we considered a simplified problem of estimating the mean 
in samples from a normal population with unit variance. Suppose now that we require 
to determine confidence limits for the mean y in samples from 


rate (Cen) 


The approach of Example 19.1 would lead us to the conclusion that, for confidence coefficient 
0:9545 and central intervals, 


20 2g 
ï «o oy r A A = 0:9545. 
DE vn & ne a} 0-9: 


But we cannot now say that the confidence limits are Z + 20/ y/n because c is unknown. 


Consider then the distribution of z — TE, where s? is the sample variance. This, 
is known to be the “ Student” form 
k dz 
dF = ES CM 
(14-22 


(Cf. Example 10.6, vol. I, p. 239.) Given «, we can now find zo and z;, such that 


and hence 


which is equivalent to 

P{e — szo <p <&+4 sy} =a. 
Hence we may say that p lies in the range # — sz to & + sz, with confidence coefficient 
a, the range now being independent of either u or c. In fact, owing to. the symmetry of 
* Student's " distribution, z, = z,, but this is an accidental circumstance peculiar to the 


present case. 


v 
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19.27. The possibility of finding confidence intervals in this case arose from our 
being able to find a statistic z, depending only on the parameter under estimate, whose 
distribution did not contain ø. A scale parameter can often be eliminated in this way, 
although the resulting distributions are not always easy to handle. If, for instance, we 
haye a statistic ¢ which is of degree p in the variables, then t/s? is of degree zero, and its 
distribution must be independent of the scale parameter. When a statistic is reduced 
to independence of the scale in this way it is said to be “ studentised,” after “ Student ” 
(W. S. Gosset), who was the first to perceive the significance of the process. 


19.28. It is interesting to consider the relation between the studentised mean- 
statistic and confidence zones based on sufficient estimators in the normal case. The 
distribution of means and variances in normal samples is 


SEE eaae it expl EN del 
dF = ges en { a5: = wh ase, § exp ( s) as . (19.33) 


and @, s are jointly sufficient for u, c. In the sample space W the regions of constant $ 
are hyperplanes and those of constant s are hyperspheres. If we fix z and s the sample- 
` point Æ lies on a hypersphere of (n — 2) dimensions. Choose an area on this hypersphere 
of content « Then the acceptance region will be obtained by combining all such areas 
for all z and s. 

One such region is seen to be the “slice” of the sample-space obtained by rotating 
* the hyperplane passing through the origin and the point (1, 1 . . . 1) through an angle 
- ma (not 2z4 because a half-turn of the plane covers the whole space). 

The situation is illustrated for » — 2 in Fig. 19.5. 


Fic. 19.5. 


For any given w’ the axis of rotation meets the hyperplane u = p’ in the point 


@, =X, =’, and the hypercones z = =constant in the W space become the plane 


X. 


vá 


| 
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areas between two straight lines (shaded in the figure). These may be regarded as regions 
of acceptance, and one set is that obtained by rotating a plane about the line Tı =% =p 


through an angle so as to cut off in any plane u = ’ an angle T on each side of 
2, —u' —z,—Ww. 
The boundary planes are given by 
©, — u = (£a — u) tan G 
4 
Y, — p = (a — u) tan (i4 


where B = x(1 — a); or, after a little reduction, 


u then lies in the region of acceptance if 


& kx, |tt] f titt, |23—2,|]. ,f 
DET 3 cobs SBA tnt moar on solo: 
These are in fact the limits given by * Student’s ” distribution for n = 2, since the sample 


3 , — a, /2 
variance then becomes | Tux 


Ier eds lfz 1—« f 
= =S [3e tan E. E 
SE l1 +z? He a 2) 2 ox 


so that Zo = tan E — £) = cot B 


and 


19.29. Tables or diagrams of the confidence intervals for selected values of « have 
been given for the following parameters :— 

(a) the proportion w in the binomial (Clopper and Pearson, 1934) ; 

(b) the parameter of the Poisson distribution (Garwood, 1936; Ricker, 1937) ; 

(c) the correlation coefficient in normal samples (David, 1938a); 

(d) the median in samples from any population (K. R. Nair, 19405). 
In addition, results for the mean of a normal population may be obtained from “ Student’s ” 
integral as shown above. Those for the variance of a normal population may be obtained 
from the /-function or the equivalent y?-integral. For simultaneous estimation of mean 
and variance there are difficulties, as we proceed to show. 


19.30. It might have been expected that the foregoing theory could be generalised 
to give simultaneous pairs of confidence intervals for two unknown parameters when 
intervals for each separately cannot be found. Very little progress in this direction has, 
however, been made. The difficulty may be illustrated by reference to the joint distri- 

G 
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bution of mean and variance (19.33). From the independent distributions of @ — p and 


Z we can, given «, f, find to tı and uo u, such that 
P[-t ELE <t} =a 


P {u <i<u} =p 


where the ?’s and w’s depend only on sample values and «, B may be chosen at will. The 
inequalities are equivalent to 


ot |b Se Ch me ee ey sl ye (19.34) 
c RR ir. (19.85) 
Uy Uo 
and these give 
- to E t z 
— — S8. 5 . f . (19.36 
& PESES ee (19.36) 
But can we then infer that 
- to 4 ü 
=F. — = . > (10.37; 
E FE. «u SI ys ( ) 


where y is a constant dependent on « and B? We cannot. This equation is, in fact, 
not generally true. The fact can be verified by considering the distribution of the statistic 
Z — ks and showing that its distribution function F (u) is not independent of and c. 


19.31. In the next chapter we shall see that a similar problem, giving rise to Behrens' 
“test, provides a crucial point of difference between the theory of confidence intervals and 
that of fiducial intervals. All we need say here is that from the point of view of the former 
the problem of simultaneous confidence intervals for several parameters remains unsolved, 
except of course in the degenerate case when we can find independent intervals for each 
parameter separately. : 


19.32. In conclusion we indicate without proof a few results which have recently 
been obtained. 

(1) Wilks and Daly (19395) have generalised the theorem of 19.12 to the case of several 
parameters. Under fairly general conditions the confidence regions which are shortest 
on the average are given by 


1 ô log L dlog L s 
zz {ay a0, — 0 e 


where (a) is the inverse matrix to that whose general element is 


a (384 D 
00; — 00; 

and 72 is such that P (y* < jl) = «, the probability being calculated from the y?-distri- 
bution with » — 1. This is clearly related to the result of 17.46 giving the limiting forms 
of variances and covariances of maximum likelihood estimators. 

(2) Wald (1942) has considered the problem of large samples from the point of view 
of most selective sets (“ shortest ” in Neyman's sense) and has proved results somewhat 
similar to those of Wilks and Daly. 
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(3) Wald and Wolfowitz (19395, 1941c) and Kolmogoroff (1941) have considered the 
problem of setting confidence limits to the terminals of an unknown frequency-distribution. 


NOTES AND REFERENCES 


When the theory of confidence intervals and that of fiducial intervals were first devel- 
oped many statisticians regarded them as equivalent. In papers written between 1930 
and 1938 “ confidence limits ” and “ fiducial limits " are often used in the same sense; 
and even where a distinction of approach was drawn the results given by the two methods 
appeared identical. The case of Behrens’ test, however, provided an illustration where 
the methods lead to different results—see the following chapter. 

The fiducial approach is due to R. A. Fisher, references being given at the end of 
Chapter 20. The approach of the present chapter has been developed mainly by Neyman 
(see particularly 19375), E. S. Pearson, Wilks (19385, c, 1939a and—with Daly—1939b), 
Wald (19392, 1942), Welch (19394), and Bartlett (1936a, 19394). A number of the references 
to Chapters 26 and 27 are also relevant. 

Confidence intervals can be obtained for the median and other quantiles which are 
independent of the form of distribution. See Thompson (1936), Savur (19372) and K. R. 
Nair (19405), and compare Exercise 19.5. 


EXERCISES 
19.1. Show that for the rectangular population 
dF = A 0 <s <0 


and confidence coefficient «, confidence limits for 0 are ¢ and t/y where t is the sample range 


and y is given by 
y'-!(n—(n—1)y)-1—« z 
(Wilks, 1938c.) 


19.2. Show that, for the distribution of the previous exercise, confidence limits 
for samples of two, x, and t, are 
i Ft Xi + Ya 


Ir vü-a I—y(l—a) 


(Neyman, 19370.) 


19.3. Show also, in the case of the previous exercises, that if L is the larger of a 
sample of two, confidence limits are 3 
TR ERE 
V. — a) 
(Neyman, 1937.) 
Show further that if M is the largest of samples of four, confidence limits are 
eee 
(ETa 
(For an experimental verification, see Frankel and Kullback, 1940.) 


M, 
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19.4. Show that, for the distribution 


dF — 0 e” dz, 0 &z «o 
central confidence limits for large samples with « = 0:95 are given by 
1 4 19 
Qe 
aaa! 


(Wilks, 1938c.) 


19.5. Ifa frequency function is continuous, the probability that the kth of a sample 
of n (arranged in ascending order of magnitude) lies in the range dx is 
1 
pei — Fy-t 
Bü kl QS EPI 
where F is the distribution function. Deduce that 
P (x, Maoa =1— 2 Ios (n — k + 1, k), 
where M is the median, and hence show how to determine confidence intervals for M from 
the incomplete B-function. 
Generalise the result for quantiles. Show that the results do not hold for discon- 


tinuous distributions. 
(Thompson, 1936.) 


CHAPTER 20 
FIDUCIAL INFERENCE 


20.1. We now proceed to examine a type of inference known as fiducial. As in 
other methods of estimation, given a distribution of known form depending on an unknown 
parameter 0, we shall attempt to find limits between which 0 lies in some sense associated 
with the theory of probability. To that extent our present approach is similar to the 
use of estimators with their associated sampling error and to the use of confidence intervals ; 
but it is distinct from the latter both in essential ideas and in some of the results to which 
it leads. 


20.2. Consider samples of n from a normal population of unknown mean p and 
unit variance. The sample-mean 4 is sufficient for » and its distribution is 


TENDS — wt} ae OPES (2052) 


In speaking of a distribution in this sense we regard as fixed and consider the totality 
of values of Z derived by random sampling from the population with given p. The pro- 
portion of samples falling in a range d is then given by (20.1), which holds for each 


value of u. 
We now change our viewpoint and consider a different kind of distribution based on 


(20.1). If we are given a value of z from a sample, what are the values of u which could 
have given rise to this value to any fixed level of probability ? If the deviation $ — u is 
written as A, we know that the probability of the inequality 

£—uch . E " é 2 . (20.2) 


being true is «, where « depends on A and is in fact 


few (Ce ME QM, e dee EY 


Looking at this the other way round, we may say that given any « we can find h, a function 


of « only, such that 
pet—h . ; 2 3 : . (20.4) 


is true with probability «. For any fied x this gives us a distribution of p. Consider 


in fact the equation 
arr eek a es He CMM (202) 


If u has a distribution function F (u), we have, since (20.4) is true with probability o, 


h n næ? 
1—«-F(uy-1l Tos |£ exp (- z)e 


SR. » 
f) du = - [2 ew (- F) a 


But in virtue of (20.5), du = — dh and h = u — &. Thus 
n — gg 
J (u) du s exp ( whe D =) du. D H . (20.6) 


This is called the fiducial distribution of Be 


whence 
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20.3. It so happens that in this example the non-differential parts of (20.6) and 
(20.1) are the same. This is not essential although it is not infrequent. The crucial 
point of difference, however, lies in the appearance of the differential element dy, relating 
to the variation of u, and the disappearance of dz relating to the variation of z. We have 
derived a distribution of the parameter from that of the random variable z by trans- 
ferring our attention in (20.4) from Z to u and regarding the inequality as still satisfied 
with probability. «. 


20.4. We note in the first place that this distribution is not necessarily existent. 
When we come to make an inference in any particular case we do not assume that yj is 
itself distributed in the fiducial form in the sense that it has been chosen at random from 
an existent population of ws of that form. Such a prior distribution, which would be 
required for the application of Bayes' theorem, is not admissible from the point of view 
of the frequency theory of probability. The fiducial distribution is a hypothetical one of 
conceivable values of u. We attach probabilities to these values, or rather to values in the 
range du, by identifying them with the probabilities (based on frequency) which are derived 
from the distribution of a sufficient estimator of u. For this reason the fiducial distribution 
is not a frequency-distribution in the ordinary sense; but it is a probability distribution 
in its own special sense. We use it to make statements of the kind: among the values 
of u which are possible, only those in a certain range give rise to the observed z with 
probability «, and hence we will locate u in that range. 


20.5. In our present example the argument would proceed as follows. From equation 
(20.6) and the use of the normal integral, the probability that u — @ does not exceed a 
certain À is ascertainable as a function of h; for instance, 


Pls E Dee al = 0:97725. 

If we regard a probability as high as this as acceptable, we may say that y < $ + 2/ Vn. 

This result is equivalent to that given by the theory of confidence intervals, for if we 
assert u < d 4- 2/4/n we shall be right in the long run in 97:725 per cent. of the cases. This 
identity of result is found in most elementary cases where a single parameter is concerned, 
but is to be regarded as accidental. In the theory of confidence intervals it is fundamental 
(a) that the assertion as to the parameter lying in a given range should be true in an assigned 
proportion « of the cases, and (b) that no assumption need be made as to the prior dis- 
tribution of the parameter, either in the frequency sense or in the fiducial sense. In fiducial 
theory it is not necessary that (a) should be true, but the fiducial distribution is 
a fundamental part of the inference. 


20.6. There is a further distinction between the two theories. In that of confidence 
intervals it is possible to have two entirely different sets for the same parameter, and in 
fact part of that theory is devoted to finding “ best" sets among the possible ones, In 
fiducial theory such a state of affairs must not be possible, for different limits would imply 
different fiducial distributions for the same parameter on the same evidence. This is avoided 
by confining fiducial distributions to those based on sufficient estimators, or more generally 
on a set of estimators which together avoid all loss of information. Since such estimators 
alone contain all the information relevant to the problem of estimation they alone can 
give the fiducial distributions accurately. It follows, of course, that where no sufficient 
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estimator—or estimator with complete set of ancillary estimators—can be found, the 
fiducial method is inapplicable. 


20.7. Generally, let F (0, t) be the distribution function of a sufficient estimator t 
for a parameter 0. Then for the frequency distribution of t we have 


ap = P 0a, ML RR RR eet anu 


F (t, 0) is the probability that a random value of the estimator does not exceed a given 
value ?. In accordance with the fiducial principle, this may be equated to the probability 
that for fixed t the value of 0 will exceed t, so that for the fiducial distribution of 0 we have 


[] 
5. ES RY fy (ES 
d. æt! F (t, 0) } d0 
... 8F (t, 0) 
= -a 4 
This shows the general relation between the frequency-distribution of the estimator and 
the fiducial distribution of the parameter, 


V yan (30,8) 


Example 20.1 


If p is known, the estimator 6 =; is sufficient for 0 in samples from 
gp-l e-z/o 
- PTO) da, 0crz«o 


the distribution of Ó being, in fact, 
(np teas (=) aa 
isa (wp) ^ ; 

(Cf. Example 17.8.) We may write this in the form 


na) 

^ exp | — —— 

dnm (ey zu DAT (25). > SM (20,0) 
0 

It is then clear that, since 


the corresponding fiducial distribution of 0 is 


6 
(OT) atl o. qmm 


which may also be put in the form (20.9), provided that we interpret the differential Vues 
now as relating to 0 and not to 0. It will be noticed that we have replaced dŷ by 9, 


not merely by dé. d > 
From the fiducial distribution (20.10) we can find the probability that 0 lies in a certain 


range dependent on the observed ( and the chosen probability «. This is in fact the same 
range that we should obtain by applying confidence intervals to (20.9), Once again the 
results of the two methods are the same. 
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Fiducial Inference based on “ Student's " Distribution 


20.8. Consider now the estimation of the mean in samples from a normal popula- 
tion with unknown variance o*. The treatment of 20.2 is no longer of use, for it would 
result in a fiducial distribution of containing the unknown c... We therefore “ studentise ” 
the problem by considering the distribution of 


Ute M NERA t. s (20.21) 
8 
which is independent of c, being in fact 


dF c a 


t \ ie +1 
us 
y 


where » =n — 1. Here s’? is the unbiassed estimate of the sample variance 
1 


IU S. (2012) 


— 7)2. 
——À Z(x-—2z)* 
The distribution of £ may be written 
(tel 
Soo E « . . (20.1: 
dF œ =n (20.13) 
8'* (n — 1) 
The fiducial distribution is then : 
CLPBI TA "um oW M RE (20.14) 
T (u — 2) ni 
8'? (n — 1) 
In the usual way we can find two constants, for any given «, such that, from (20.14), 
P(m <u <m} =a, . " s E . (20.15) 


the probability being based on (20.14) and therefore to be understood in the fiducial sense. 
Had we worked with (20.12) or (20.13) we should have found t,, t, such that 


P(—t, «t <t} —«, 
which is equivalent to 


tee pee tn 
Pfa MR «cu see =a, : 5 + (20.16) 
This may be interpreted in the sense of confidence intervals, ie. that in asserting the 
inequality in (20.16) we should be right in a proportion « of the cases in the long 
run. (20.15) does not rest on this statement as to frequency, though the limits to which 
it leads are the same and the statement happens to be true. 


20.9. The case we have just discussed raises a new point. Is it still true that 
the fiducial distribution is unique, and is it consistent with the distributions of u and c 
separately ? The distribution is based only on the sufficient estimators Z and s’ (which 
are jointly but not separately sufficient for y and c) and we should expect this to be so. 
But the matter requires investigation, for we are here using a fiducial distribution based on 
two estimators. 
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The simultaneous distribution of z and s' is 


1 its cca we fons — (n — 1) s'? ds’ 
dF oz exp { agi @ nt} az (©) exp { ai} » » (20.17) 


o 


If we were considering fiducial limits for u with known c we should use the distribution 
1 (OPE = 
dF oc = exp {- 0 -appas. 
If we were considering fiducial limits for ø with known y we should not use the other factor 
in (20.17), 
coe (n — 1) s? ds’ 
dF «(£) exp { ex] eum 5 . (20.18) 
for in such circumstances s' is not sufficient for o, the appropriate estimator being 
22 (æ — u)?. The question is, what form of fiducial distribution must hold for o in order 


that the * Student " form (20.14) should hold for u when ø is unknown ? 
Suppose the fiducial distribution is f (s', c) de. We have then for the joint fiducial 
distribution of u and oc, 


1 Tous è 
dF œ = oxP {-sa@ — nt} def) do. 
We have therefore to solve 


UE exp { 2 L PEE ore, e) de] dp = . (20.19) 


n 
where k is some constant. Putting (y — z)* = «, — eren B, we have then to solve 


EGR EOS 


n —1)s'? 


Regarding « as the complex quantity it we see that 3 f (*. J = 5) is the frequency 


n 
function whose characteristic function is 1 yi { 1+ scan] which gives 


pe-p) em (71) 
from which we find 
E a} 
fuge Sop [- 6 Iu. 
or, on evaluation of the constant, 
: 2 (n — 1) s? n-D (n — 1) s?) do 
I (s, o) do zen 393 } exp iro zu (20.20) 
2 


This, then, is the fiducial distribution which ¢ must obey. We should have arrived at 
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the same result had we taken (20.18) and transformed it to the fiducial form, as if it related 


to s' and c only and the former were sufficient for the latter. 
It appears, then, that in this case at least the fiducial method gives consistent results 
when two parameters are involved. The general problem of many parameters presents 


difficulties and has not been elucidated to any great extent. 


The Logic of Fiducial Inference 

20.10. The notion of fiducial probability was introduced by Fisher (1930) for the 
case of a single parameter. Regarding the estimate ¢ as fixed, Fisher considers the dis- 
tribution of values of 0 for which t can be regarded as a representative estimate—representa- 
tive, that is to say, in the sense that it could have arisen by random sampling from the 
population specified by 0. As pointed out above, this does not mean that we are regarding 
the true value of 0 as a member of an existing population. Rather, we are considering the 
possible values of 0 and attaching to each value a measure of our confidence in it, based 
on the probability that it could have given rise to the observed t. 

If I interpret him correctly, Fisher would regard a fiducial distribution as a frequency- 
distribution. This implies that 0 is regarded as a random variable. It appears to me, 
however, that it is not a random variable in the ordinary sense of the frequency theory 
of probability, in which values of 0 either are or can be generated by an actual sampling 
process. We can never test whether the fiducial distribution holds in the frequency sense 
by drawing a number of values and comparing observation with theory. Nor, in calcu- 
lating fiducial limits of the type 0 = t + h (x), do we imply that the proportion of cases 
for which 0 <t + h is true will be « in the long run. 


20.11. The reader has a choice of several attitudes towards the foundations of the 
fiducial argument : (a) he can accept the argument as involving a new postulate of infer- 
ence; (b) he can regard it as sanctioned by the approach of the previous section ; or (c) he 
can, so far as estimates based on a single parameter are concerned, console himself with 
the thought that the results of the process are the same as those given by the theory of 

confidence intervals. 


20.12. Although Fisher is careful to emphasise the distinction between his own 
approach and that based on Bayes’ postulate, it is interesting to note that the theory of 
inverse probability as modified by Jeffreys gives results which are in many cases identical 
with those of fiducial inference. 

In the example of 20.2, for instance, suppose that the prior distribution of y is f (u) du. 
Then for any given Z the posterior probability of is 


D n 
dF =f (u) du xs exp [-$e — nh. K : . (20.21) 
If the total probability is unity we have . 
e. m w 
[fw Jz ew {-3e-m'} Gian lee TL Es dr (20.22) 


Clearly f (u) = 1 is a solution, and we may use characteristic functions to show that it is 
the only solution. In fact we have from (20.22), writing it for nz— 


roe (t) SE (E) 


ai 
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2 
The expression on the right is the characteristic function of exp ( — 2 ) and hence 


| fü) exp (—"5") = exp (- 7). 
or f(u) =1. 


We have, then, for the posterior probability distribution of y, 


dF NES zu ah dy, sore SURE (0:23) 


which is the same as the fiducial distribution. The requirement that f (u) = 1 is equivalent 
to a prior distribution of p, dF = du, which is the form given by Bayes’ postulate for a 
parameter which can extend to infinity in either direction. 


Example 20.2 
In Example 20.1, a similar argument leads to a prior distribution of 0, 
do 


dF «c 


This is the form given by Jeffreys’ modification of Bayes’ postulate when a parameter 
can extend to infinity in only one direction. 

It does not appear, however, that fiducial and inverse probability always give the 
same results. Consider the distribution of the correlation coefficient in normal samples 


(14.14)— 


n-1 n-4 qn-3 IN 
dP c (1—p2)3 (1—7:2)3- a E ( 5e . (20.94) 


(rp) v(1 — pr?) 
The argument of the type we have just employed would require a prior distribution of p— 
dp 
oe 


and the resulting posterior distribution (which is equivalent to that obtained by inter- 
changing r and p in (20.24)) is not the same as we should get by using equation (20.8). 
Behrens’ Test à 
20.13. Suppose we have two samples of n, and n, members from normal populations 
with possibly unequal variances. The fiducial distributions of xı and us are of the 
“Student? form (20.14). Writing temporarily in this and the next section sj? for 
E(x, — &)2/m,(m, — 1) and similarly for 3,*, and putting 
pam B+ DEA 
Ha = Ëa + 83 Ug 
we have 
m — mA ia EE Me + os s (20:26) 
If now QU ) 
Vere)” ee 
e depends only on the known quantities Z and s' and the difference of means ji — jus. 
From the fiducial distributions of u, and jy we can find that of e, and hence make fiducial 


statements of the type 
Ey — By — eo (ed) < pr — Ha XA — Ëa + e1 V1" e)  . (20.27) 


é= 
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20.14. The distribution of e is not of a simple form. Putting tan y — 3 we see that 
1 


pe i, doy es sin y, . 3 . (20.28) 
E 82 


so that e is distributed fiducially as the weighted difference of two variables, each of which 
is distributed as “ Student's" t. We have then to find the distribution of 


e = t, cos y — t, sin y 
where the joint distribution of t, and t, is given by 


dF œ 9. Le . — « (20.29) 


B oW Gyms 
ec aa) 


The distribution has been studied by Sukhatme (1938b) and in more detail by Fisher 
(1941a). Tables are given for various values of nı, n, and the ratio s,*/s* (or the equiva- 
lent angle y) showing the values of € corresponding to given probability levels. Some of 
the tables are included in the second (1943) edition of Fisher and Yates’ Statistical Tables for 


Agricultural, Biological and Medical Research. 


20.15. The joint distribution of s;? and s;* is 


dF ec sm-3 sn? exp = TL REST (aye) s) ds, deis. 
oi 99 


Putti m5 and «wed js n 
utting en and w—1 Dc Di 
we find, on a little reduction, 
1n ee dp d (Qm-4) e-u 2 
fe pm —1) ,m— Limma A e^" du. . (20.30) 
ru m PNE s 


Thus w is distributed (independently of p) in the Type III form. Further, 
(E — ui) — (Es — fa) is distributed normally about zero mean with variance oj + 0j. 


2 
Hence, if Boss 0, we find that the quotient 
2 


oz 
(E — pa) — (Es — wa) }2 (ms Hna — 2) L o El p)(m Em — 2) (20.31) 
A (n, — 1) 8,? (n, — 1) s? x MEC j 
arae p E] {mD a 5040 


is distributed as t2 with m, + n: — 2 degrees of freedom. (Cf. Example 10.17, vol. I, 
p. 248, for the distribution of a normal variate divided by a Type III variate.) 

Now if we knew 0 we could find fiducial (or confidence) limits to e, and hence to tı — fla, 
in the usual way, for the distribution of e would then be independent of unknown constants 
and ascertainable from “ Students " integral. Since, however, 0 is not known, we require 
in turn the fiducial distribution of this quantity. Since 


"2 "2 
— 4 Jog (HSE [mi s 
| e SE 
is distributed in Fisher's form (cf. Example 10.18, vol. I, p. 249), the required fiducial 


7s 
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form for 0 can be obtained from that of z, which incidentally is equivalent to that of p 
in (20.30). If we express (20.31) as the joint fiducial distribution of e and 0 and integrate 
out for 0, we shall be left with an equivalent form to that derived from (20.29). 


20.16. It also follows from the above that the inequality (20.27) is not satisfied in 
proportion « of the cases independently of 0, so that the limits to jj — jr; are not confidence 
limits, although they are fiducial limits. It will, in fact, be evident enough from (20.31) 
that if we determine f, and ¢, so that the integral of “ Student's ” form between those 
limits is «, then the corresponding limits for e, say £o and e,, are dependent on the variance 
ratio 0 = o2/o3. This is fairly evident on general grounds, and the point has been put 
beyond doubt by both Fisher (19375) and Neyman (1941a), who have worked out particular 
cases of difference. 1 

The fiducial distribution of s (which is an extension by Fisher of a result given by 
Behrens as early as 1929) thus provides a crucial point of difference between the theory of 
fiducial inference and that of confidence intervals. 


20.17. In conclusion, we will indicate the viewpoint of Jeffreys towards the type of 
problem dealt with by “ Student's" distribution for limits to the mean and Behrens’ 
distribution for limits to the difference of two means. 

If H denotes the general data, we have for the * Student” distribution— 


k dt 


12\F OTD © 
( t 3j 
y 


The expression on the left states the probability that ¢ will lie in'a given range dt on the 
assumption that H is true, the parent mean being u and the parent variance g?. Since 
u and o do not appear on the right they are irrelevant and may be suppressed, and hence 


P (dt|u, o, H} = ; . (20.32) 


Pld |B) ee c ee. "cH 


(: ee aie $ 
v 
Suppose now that we assume that 
P {dt|z, s, H} =f (t) dt. . ; . + (20.34) 
Then, as before, £ and s may be suppressed and we have 
Pidt|H}=f(d, . ʻi . 5 . (20.35) 
and hence, by comparison with (20.33), 
Pile s Ble = d. wok T E. (20:36) 


pNEUED* 
(3 
v» 


We can then proceed to find limits to t, given 2 and s, in the usual way. Jeffreys empha- 
that this depends on a new postulate expressed by (20.34) which, though 
natural, is not trivial. It amounts to an assumption that if we are comparing different 
distributions, samples from which give different Z's and s's, the scale of the distribution 
of u must be taken proportional to s and its mean displaced by the difference of sample 


means. 


sises, however, 
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20.18. In a similar way it will be found that to arrive at the Behrens distribution 
it is necessary to postulate that 


P { dt,, dt, | £y, Es, 8,, 8, H} = fi (t) fa (ta) dts dt; . : . (20.37) 
Jeffreys’ derivation of the Behrens’ form from Bayes’ theogem would be as follows :— 
The prior probability of du; du; do, do,|H is 


P (du, dj doy da, | H ) oc 22 din dei dos 


0, 0; 
The likelihood (denoting the data by D) is 

n E n alan, as 
P{D | uas fa, 01, Fa, H} oc oom exp E 3g; Un —4)*-F st) IT 35i [us — #,)* + a}. 


Hence, by Bayes’ theorem 
1 n z 2 
P (du, dp, do, do; | DH} = gpirgpei XP [- Bo? { (#1 — 2)? + sî} 


air ae he 4 di djs, do, doy. 
. 20; 
Integrating out the values of o, and c;, we find for the posterior distribution of p; and y 


a form which is easily reducible to (20.29). 


20.19. To sum up: so far as concerns problems of estimation the Behrens test is 
accurate both in fiducial theory and in the theory of probability propounded by Jeffreys. 
But the test does not hold in the theory of confidence intervals. In fact the latter fails 
to provide an exact solution to the problem, though we shall see below (21.28) that approxi- 
mations are possible. Fisher has criticised confidence intervals on the ground that they 
do not give an answer to what is admittedly an important question ; but it appears possible 
to maintain consistently that some questions may not have an answer. 


NOTES AND REFERENCES 


For the general theory of fiducial inference see Fisher (1930a, 1933, 1935a, b, 1936c, 
1941a). The difficulties of reconciling Behrens’ test with confidence-interval theory were 
noticed by Bartlett (1936a) and led to some controversy, for which see Fisher (1937), 
1939a, 1940c), Bartlett (1939a), Yates (1939/), and Neyman (1941a). For Jeffreys’ views 
see his papers of 19370, 1938c, 1939d and 1940. 

For the practical application of Behrens’ distribution see Sukhatme (19385) and Fisher 
(19412). Behrens himself stated his results explicitly only for the case of equality of sample 
number, nı = na the extension being given by Fisher (19356). 


EXERCISES 
20.1. If z is the mean of a sample of n values from 
TEL _ @& =»)? 
P= Sea) { 2a? j* 
$'? is equal to m = i X (x — &)*, and v is a further independent sample value, show that 


ed Efe 
E n+l 
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is distributed in * Student's ” form with y = n — 1. Hence show that fiducial limits 


for x are 
E+ts't, JE H 


7, 
where f, is chosen so that fhe integral of “ Student’s " form between — t, and t, is an 
assigned probability «. 
(Fisher, 19355. This gives an estimate of the next value when n values have 
already been chosen, and extends the idea of fiducial limits from parameters 
to variates dependent on them.) 


20.2. Show similarly that if a sample of n, values gives mean z, and estimated variance 
8;*, the fiducial distribution of mean Z; and estimated variance 5? in a second sample of n, is 
8, 1-1 8,71-? dé, dj, 
Ny na 


[e née m nee m ca fete 


Hence, allowing n, to tend to infinity, derive the simultaneous fiducial distribution of 
LA and o. 


dF œ } Fm +m- 1) 


(Fisher, 1935b.) 


, 


CHAPTER 21 
SOME COMMON TESTS OF SIGNIFICANCE 


Tests of Significance 
21.1. We now pass from the problem of estimation to that of significance. The 
two are closely allied and in practical problems they both arise together as a rule; but 


it is useful to preserve a distinction between them. In estimation we try to find, with 


greater or less accuracy, the value of some parameter in a population which is known to 
be (or assumed to be) dependent on that parameter. In tests of significance we are given 
some value of a parameter beforehand and wish to decide whether it is acceptable in the 
light of the evidence. This is the distinction in its simplest terms, but of course the 
associated problems become increasingly complex when several parameters are concerned, 


21.2. From one point of view the problem of significance is logically anterior to that 
of estimation. Suppose we have records of the yields of two varieties of wheat grown 
under similar conditions, and are interested in a comparison of the average yields of the 
two. Our first question is whether the observed mean yields indicate any difference between 
the varieties—a matter of significance. Not until significant differences are established 
does our interest turn to the magnitude of the difference—a matter of estimation. Again, 
if we have a set of records of only one variety, our primary problem may be to decide 
whether they are consonant with the hypothesis of normality in the parent population, 
whatever its mean and variance ; and only when this point has been settled affirmatively 
do we proceed to estimate those parameters. 

Nevertheless, we have lost very little by taking the problem of estimation first. In 
some practical problems the question of significance is already decided, and in many others 
we use estimates of parameters to test the significance of the latter, in which case estimation 


. and significance become different aspects of the same statistical fact. 


21.3. We shall consider the general theory of testing statistical hypotheses in Chapters 
26 and 27. That theory is, however, rather abstract, and we anticipate it to some extent 
in this chapter by giving an account of the principal tests in current use, without for the 
moment going too deeply into their rationale. It will be seen later that there are sometimes 
many significance tests which can be applied to the same problem, and that it is possible 
to lay down criteria for deciding which, if any, are the “ best". This aspect of the subject 
will not concern us for the present. We shall not discuss whether the tests we describe 
are the best possible (though some of them, in fact, are so) but shall merely present them 
as useful and convenient, albeit perhaps not unique, solutions of our problems. 


21.4. Developments in statistical theory in the last two decades have resulted in 
a great many tests of significance appropriate to special problems. It is not easy to classify 
them and quite impossible to deal extensively with them all. We shall consider them 
under the following heads :— 

(a) Tests of the significance of a specified parameter value.—The typical hypothesis 
here is that a parameter in a population of known form has a specified value (usually 
zero). We wish to know whether the evidence provided by the sample supports the 
hypothesis or not. 
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(b) Tests of goodness of fit.—The hypothesis is that the population is of a certain 
kind which is either fully specified beforehand or can be “ estimated ” with the help 
of the data. We wish to know whether the sample values fit this population in the 
sense that they could have arisen from it by random sampling to any acceptable degree 
of probability. This hypothesis is more general than that of (a) since it concerns 
the whole distribution function and not merely one of its parameters. 

(c) Tests of homogeneity—The hypothesis here concerns two or more populations, 
each providing a contribution to the sample. We wish to test whether the populations 
have certain parameters in common, or in the extreme case, whether they are identical. 
This case can be regarded as an elaboration of (a) where several parameters are simul- 
taneously tested. In the particular case when only two populations are concerned 
we may sometimes reduce it directly to type (a) by considering differences; e.g. if 
we are making a comparison of parent means the hypothesis might be that the single 
difference of means is zero. 

In addition we shall also consider two sets of tests of rather a different kind :— 

(d) Tests of order of occwrrence.—The hypothesis here is that the sample members 
occurred in random order, and we wish to ascertain whether the observed order indicates 
any systematic effects, as, for instance, whether there are any cyclical effects in time- 
series. The test here is of the sampling process rather than of parameters of the 
parent population. 

(e) Conditional tests.—The hypothesis may be any one of the above types, but 
we restrict the inference to a sub-population for which certain qualities are deter- 
mined by the observed sample values. For instance, we may use the distribution 
of the sample variance s? for which the mean @ is equal to the observed value. In 
short the variation of sample values is conditioned. "Type (d) may from some points 
of view be regarded as a particular case of this type. 


Tt is not intended to convey that the above five categories are mutually exclusive. 
A test of type (a) may, for example, be conditional or non-conditional. The classification 
will, however, provide some sort of articulation for a rather long chapter and serve to 
explain our sequence of treatment. 


Standard Errors 

21.5. For large samples the test of significance of a parameter can usually be carried 
out by standard errors. We find an estimator ¢ of the parameter 6 and consider whether 
the given value of 0 falls in the range t, + k+/ var t, where t, is the value of ¢ for the observed 
sample and & is a constant chosen at will according to a probability x. Ifso we may accept 
the value of 0, at least so far as this test is concerned ; if not, we reject it. 

Tf the variance of ¢ does not depend on unknown quantities such as other parameters, 
this type of inference is justifiable as an application of the theory of confidence intervals. 
In accepting 0 when it falls in the range ¢, + k«/var t, we shall be right in proportion « of 
the cases in the long run. As a refinement we may, of course, use non-central intervals 
and locate 0 in an asymmetrical range tı — ky)/vart to tı + k,»/vart. The test of signifi- 
cance is equivalent to the estimation of the true value of 0; and it will clearly be better 
if the range of estimation is narrower, for then we reject more wrong values of 0. 


21.6. If the variance of the estimator ¢ depends on unknown parameters 0; . . . 0p 
we can usually substitute estimates of those parameters obtained from the sample itself, 
A.S.—VOL. II. H 
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provided that the sample is large. For example, we have for normal samples 


5 ZON 
P (u <7 + EA = 0:97725. 
The sample standard deviation s will differ from o by a quantity of order 1/4/n, so that 
to that order 


2s 
Zz - ——> 0917720. 
Plu <f+ ^ 0-97725 
The approximation breaks down for small samples, and more accurate methods are required. 


21.7. The use of standard errors in testing significance has been illustrated in previous 
chapters, and we need not enlarge on the process further. We may, however, remark 
two things :— 

(a) That if the distribution of an estimator t tends to normality for large samples 
irrespective of the parent form (as, for instance, is the case with the mean and other moments 
under very general conditions), it is not necessary that the hypothesis should specify the 
parent form. In short, our test of significance is independent of the parent, a valuable 
. generality which rarely obtains for small samples. 

(b) That we have justified the logic of reasoning involving the use of standard errors 
by the theory of confidence intervals (and a similar justification can be given in terms 
of fiducial intervals if we use an efficient estimator for which the loss of information tends 
to zero relative to the total information in large, samples). This appears to be the most 
satisfactory basis for the use of standard errors. The usual intuitive basis advanced 
(necessarily) in introductory textbooks is not easy to defend. For instance, it is customary 
to reject a value of 0 if it gives to an observed f, or greater value a small probability ; and 
there is no obvious reason why we should base our inference on the improbability of greater 
values of 44, namely on the improbability of something which has not occurred (see 21.55 
below). Our present approach shows that in fact the use of standard errors can be justified 
logically without invoking a new principle of inference. 


Significance of the Mean in Normal Samples 


21.8. Suppose we have a sample from a parent population which is known to be 
normal, but of whose mean and variance we are ignorant. We wish to test the significance 
of a given value js of the mean, that is to say, we wish to consider whether the observations 
could, to any aeceptable probability, have been derived from a population with mean po, 
whatever the variance may be. 

We calculate the statistic 


t= ME a ma e NS Speen 121) 
all the quantities in which are given. We know that the distribution of ¢ is 
r(? a *) 
2 dt (21.2) 
ver (3) (2 HER pcr. 
2 » 


and hence can find the probability that our calculated value of ż is attained or exceeded. 
Tf this is small we reject xo; if not, we accept it. What values are regarded as " small” 


dF = 


E 
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for this purpose is a matter of convention, but the most frequently used values are 0:05, 
0:01 and 0-001. 

From the work of the previous two chapters it will be evident that this type of infer- 
ence is the confidence- or fiducial-interval approach in a slightly different form. Given 
a we can find — 1, and t, such that the integral of dF in (21.2) between those limits is a. 


This gives us confidence or fiducial limits to u of the type z — - and $ + 5 ; and if 


Ho lies in this range we accept it. In particular cases we may have te = ¢,, in which cases 
the intervals are central and our probability « is the chance of £ being attained or exceeded 
in absolute value; or te = + oo, in which case « is the chance that — /, will be attained 
or exceeded, and no lower limit to ju, is imposed. 


Example 21.1 

'The weights of fifteen bags of sugar taken from a filling machine are found to be, in 
ounces, 16-1, 15:8, 15-8, 15-9, 16:1, 16-2, 16-0, 15-9, 16-0, 15-7, 15-7, 15:8, 16-0, 16-0, 15:8. 
Each bag should be 16 ounces, but some deviation is inevitable. One of the manufac- 
turer's problems, of course, is to keep this deviation to a minimum, but that is not the 
point we now consider. Our question is: if the machine is supposed to be giving weights 
of 16 ounces on the average, does the sample suggest that it is failing in its purpose ? 

The hypothesis is that the parent mean is 16 ounces and the deviations from this 
mean are, in order of magnitude, — 0-3 (twice), — 0:2 (four times), — 0-1 (twice), 0:0 
(four times), 0-1 (twice), 0-2 (once). The sample mean is thus — 0-08 and to that extent 
the average of the sample is slightly underweight. Is this a significant effect ? 

It will be found that s? = 0:0216 so that 

0-08 
z ONE 204, v= 14, 
From Appendix Table 3 (vol. T, p. 440) we find that for » = 14 the probability of a deviation 
greater in absolute magnitude than 2:04 is about 2 (1 — 0:969) = 0:062. This is small, 
but whether we regard it as significant or not depends on the probabilities we are prepared 
to consider as defining significance. The usual values are 0-05 and 0-01, and with such 
criteria we should not take the observed value as significant, though it arouses suspicions. 

We have here used central intervals, which are usual for the t-test of significance 
of the mean; but it is easy to imagine circumstances in this particular case for which 
non-central intervals might be required. For instance, if the machine was at fault and 
had a true mean filling weight of more than 16 ounces the manufacturer would be giving 
sugar away for nothing. This might be serious, but probably not so serious as if the 
machine was erring in the other direction, which would render him liable to prosecution 
for selling short weight. Suppose he assessed the latter risk as nine times as serious as 
the former and was working to a probability level of 0-05. Then he would require 
the probability of a negative value of t greater than the significance value to be 
0:955 (= 1 — 0-045) but could allow that of a positive value less than the significance value 
to be 0:995 (= 1 — 0:005). From Appendix Table 3 we see that this corresponds to 
deviations of approximately — 1-8 and + 3-0. Our observed value is outside this range 
and is thus significant. Small as the average shortage is, it would be prudent to overhaul 
the machine and to make sure that it is giving fair weight on the average. " 

We may note further that if the sample had occurred in the order 


15:7, 15-7, 15:8, 15-8, 15:8, 15-8, 15-9, 15-9, 16-0, 16-0, 16-0, 16-0, 16-1, 16-1, 16-2 


i 
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we should almost certainly have concluded that there was something wrong with the 
machine, for the weights are steadily rising. The t-test would give the same result for 
this sample as for the first, since it does not depend on the order of occurrence of the mem- 
bers. Where, therefore, the appearance of individual sample members is ordered in time, 
the t-test alone may fail to reveal significant effects due to the changing of the population 
between drawings. Our data are still such as could have arisen at a single drawing of 
fifteen members from a population with mean equal to 16 ounces; but the data throw 
doubt on the point whether we are really asking the right question in assuming that they 
all came from the same population. We consider the point again below (21.41). 

Before leaving this example, we may note another possible test, cruder than the t-test 
but sometimes useful. If the parent mean were really zero, positive and negative devia- 
tions should occur equally frequently in the long run. In our present case there are 8 
negative deviations, 3 positive ones and 4 zero. If we allot, conventionally, two of the 
last to each group we have 10 negative and 5 positive deviations. The expected number 
is 7}, so that the deviation is 24, with a standard error of (15 x 4 X $) = 194. The 
observed deviation is very little in excess of this, so we conclude that the preponderance 
of negative signs in the sample is not significant of a negative mean in the population. 
More exactly, we find that the occurrence of 5 or fewer positive deviations is the sum of 
the first six terms in the binomial (3 + 1) %, namely 0-151, leading to the same conclusion. 
The test is a very rough one since it pays no attention to the magnitude of the deviations ; 
but it has the advantage of applying to any symmetrical form of parent population for 
finite samples. 


Properties of the t-Distribution 


21.9. “ Student's" distribution has numerous applications in the testing of signifi- 
cance apart from the one just considered, and we proceed to study its properties. 


The form (21.2) is a Pearson Type VII and may be transformed to the Beta-distribution 
c 2 
(Type I) by the substitution $ = H / ( + 2). The distribution function of ¢ may thus 


. be obtained direct from the B-funetion. For instance, we have 


ro=f ar =4+f z(a) dt 

=o , 2Nv-l 

whence uU Cp 
er OE 


z ( s) [a-oa 


whence F=1-H; (s j : 5 y 5 5 v (21.3) 


b, 
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The values of the argument for which J has the values 0-50, 0-25, 0-10, 0-05, 0:025, 0-01, 
0-005 and » = 1 (1) 30, 40, 60, 120, co, have been tabled to five significant figures by C. M. 
Thompson and others (1941a) and can hence be used to derive the values of t corresponding 
to those probability levels. 


21.10. Except for special purposes, however, the use of the B-function is unnecessary, 
since the distribution function of ¢ itself and tables based thereon are available. 


We have 
Us t? tt (— ey 
l 14 -+3 à. quo : md 
og (1 2)-- hu CCS 
and hence 
Punt e 3 ESA vee E 
a log ee iet quee: 5 ; 
2 oe ( 2) pr BG + lye Ñ e 
Further, from the expansion for log /'(1 + x) we find 


y+] 
F( 2 1 1 1 


log EG NE = | E DUREE E E . » (21.5) 


Now as v tends to infinity, t tends to the normal form with zero mean and unit variance. 
Writing 


gom Ms ent, 
v (221) 
we find for the logarithm of the ordinate of (21.2), in descending powers of v, 
1 4 2 1 916 4 i 1 318 6 
log y 4 "8 (t 2t 1) 15 (21 305) 4 24) (3t 48 + 1) 


1 
404 


1 
40 — 548 - (512 — 6n» — 3 "SPEO 5938645 
( 515) + $07 ( ) (21.6) 
Taking the exponential and integrating from ¢ to co, we find 


pes 1 2 L 1 8 4 2 £ 1 10 1148 
Tum vut FI) gg — 76 — o0 Bt + tt l 


1 
0. zr 
H 14é + 6/4 — 3t? — 15)t 4 aie ee 37502 + 222501 2141¢8 


— 939 — 213¢4 — 915t? + 945)t +.. } d . . + (21.7) 


This is the expression, due to Fisher, which was used by “ Student " himself in calculating 
the distribution function of t given in Appendix Table 3, Vol. 1. For values of v > 18 the 
first four terms of (21.7) give F to an accuracy of about 0-000,005. 


21.11. ‘Tables are also available in the “ inverse " form, that is to say, giving values 
of t corresponding to specified values of v and F. Such tables may be derived by inter- 
polation from the “ Student ” tables or by the normalisation method of 6.32. In work 
involving tests of significance this type of table is perhaps the most convenient, since it 
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enables one to decide without calculation (other than interpolation for values of the 
argument not covered by the tables) whether partieular values are significant for chosen 
probability «. The complement of the probability a is spoken of as a level of significance 
and expressed either as à number between 0 and 1 or as a percentage. Similarly the 
corresponding values of t are called significance points, and we may speak, for example, 
of the 5 per cent. value of t, meaning that value for which F is 0:95. 

Fisher and Yates (19384) give the values of t for » = 1 (1) 30, 40, 60, 120 and oo and 
2 (1 — F) = 0-9 (0-1) 0-1, 0-05, 0:02, 0-01, 0-001. These tables, it should be remembered, 
give the significance points corresponding to twice 1 — F, that is to say the values of t 
such that the proportion of the distribution outside the range + t is 1— F. 


21.12. The number » is usually called the number of degrees of freedom of t. This 
ig an expression which occurs in other connections, and a few words of explanation are 
desirable. 

Tt has been seen that the variance of a normal sample is distributed like the sum of 
(n. — 1) squares of independent variates (compare Example 10.5, vol. I, p. 238) and gener- 
ally, that if there are I: linear relations connecting the original variates, the sum of squares 
of the originals is distributed as the sum of n — k independent normal variates of equal 
variance. Each linear relation reduces the freedom of the variation, as it were, by unity. 
Tt is thus natural to speak of the number of degrees of freedom, v, of a function such as 
43, meaning thereby that it is distributed as the sum of squares of » independent 
normal variates with equal variance. The expression only has this natural meaning when 
normal variation is concerned. 

Tt so happens that the quantity ¢ depends on a parameter » which is convenient for 
tabulating its distribution function and is also the number of degrees of freedom of the 
statistic s? entering into the denominator of t. v may thus, by an extension of the term, 
be called the number of degrees of freedom of t, but this usage does not imply that ¢ is 
distributed as the sum of squares of normal variates. 


Distribution of t in Non-normal Case 


21.13. Part of the price we have to pay for the precision of the t-test in small samples 
is the assumption of normality in the parent. If the population is not normal we may still, 
of course, consider the distribution of * Student's ” ratio, which will remain independent 
of the scale parameter; but complications appear because the parameters which express 
the deviation from normality will, in general, appear in the sampling distribution. Further- 
more, the distributions of @ and s are no longer independent. 

Let us in the first instance prove the last assertion which is due to Geary (19360), 
in the form: If the mean and variance in samples from a population are independent 
and the population has finite cumulants, it must be normal. 

From 11.13 we have 


r (217) = ed rl. 


Tf mean and variance are independent, x (21") = 0 and hence k,.9 = 0 for r> 0. Thus 
the population must be normal. It is rather remarkable that we have not had to use 
relations of the type « (Pe realy nm in arriving at this result and that we need only 
assume independence for one size of sample. 
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21.14. In the notation of Chapter 11 we write 
E» kv 
8 key — Ky \t” 
2 vere R75) 


Ka 


t 


and expand in terms of powers of aS 
2 


find for the moments of ¢ about the parent mean, assumed zero, to order v^? 


: 1 3 
fe E Fi QR — 2h 4 Basha 


pa 1 + 2 +) + LO a — BAs As + 034) 


The method follows that of 11.23 and we 


m=- 5 U + 15 (2102, — 6625 + 10525 2, + nog) (21.8) 


SS i = fy + 1422) + i (102 — 304, + 242, 
+ 12022 4-44, — 13225 As — GA? + 16822 A, + 12028) 
where AS As 
Ka” 


If the parent form is symmetrical, cumulants of odd order vanish and we have, to 
order »-? and first order terms in the ’s— 


Hi = a0 
Der oi AE SL s NEN 
hes y »À) 37 »—93 »* (21.9) 
18 , 102 2a, 302, 3(v — 1)? DA. 30A. 
pim? » | yt v y? ir (»—3)(»—5) » y? 


Except for the term in 4, these are the values of the moments of t in “ Student's” dis- 
tribution, and it follows that for symmetrical parents which are not excessively lepto- 
or platykurtic we should not expect the ¢-test to be invalidated. If the parent is skew 


the situation may be different. 


21.15. The general skew case has been considered by E. S. Pearson and Adyanthaya 
(1928, 1929) from the experimental viewpoint and by Bartlett (1935a) and Geary (1936b) 
from the theoretical viewpoint. Various writers have derived exact distributions of t 
in non-normal samples, but the sample numbers are, as a rule, trivially small and the 
results of little practical value. Geary considers the population expressed by the first 
two terms.of the Gram-Charlier series— 

= 1 mL Ka — r? — $2" 
aF = {1 cn ZI dicc edere. 1 (91710) 
and assumes that powers of «, above the first may be neglected. He finds (cf. Exercise 
21.1) that the frequency function of t in this population is equal to the “ Student ” form 


plus a corrective factor 
tdt 


[A er 
c5 
Y 


ee + 1)} (21.11) 
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The integral of this factor from — co to —í is 


Ks 1 SaN miis) 21.12 
s (secus) 5 s » Pm A so 


giving the correction to be applied. (Geary gives a table for some representative values.) 
This, of course, depends on «s, but even where exact knowledge of the skewness is not 
available we may sometimes safeguard against error by considering the correction for 
plausible values of Ks. 


Other Uses of the t-distribution 

21.16. The usefulness of “ Student's ” t derives from the fact that it is independent 
of the scale parameter, and the simplicity of its distribution from the fact that it is the 
ratio of two independent variates, the numerator distributed normally and the denominator 
distributed in the Type III form. We shall see below (21.26) that these properties can 
be used to test the difference of two means in normal populations with equal variance, 
and in Chapter 22 we shall encounter a test of regression coefficients which is based on 
the same properties. 

We have also noted that “ Student's ” form can be used to test the significance of the 
product-moment correlation (14.15) and the Spearman rank correlation p (16.18). These 
facts are, however, in a sense accidental. They do not derive from the expression of the 
parameters concerned as the ratio of a normal to a Type III variate, but from the simpler 
fact that the distributions are of the Type II form (symmetrical with finite range) and 
hence can be transformed to the “ Student ? distribution, which is of Type VII. Sym- 
metrical distributions of finite range can often be represented very approximately by a 
transformation to the “Student” form, especially if they tend to normality. 


Test of a Variance in Normal Samples 
21.17. The distribution of the sample variance s* in normal samples is 


+ (n—1) 2\% (n—3) 2 

dF = ie (=) exp (-35)@(3) 0<s<o. . (21.13) 
Jeo Boe 

2 


Thus, given for consideration a value of o? and an observed s?, we can find the probability 
that s?/o? is attained or exceeded and accept or reject c? in the usual way. The distri- 
bution function of (21.13) may be expressed as an incomplete "function, or more con- 
veniently for statistical purposes in terms of x° (= n82/g?) with » =n — 1. 


Example 21.2 
In Example 21.1 we found s? = 0-0216, » = 14. Could the data have arisen by chance 
from a population in which the true variance is 0-01 ? 
ns? 
We have 3* — 7, = 32-4, v = l4. From the diagram on p. 446 of vol. I we see 


that the probability of such a value or greater is between 0-01 and 0-001, a very improbable 
result; and hence we reject c? = 0:01 as a value of the parent variance. 

Once again this type of inference can be justified by the theory of confidence intervals 
since the probability 


2 
P s 2 seal <0-01 
g 
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is equivalent to 
2 = ns? " 
je fo < sa] < 0-01. 


In asserting that c? was less than ns?/32-4 (in our present case 0:01) we should be wrong 
more than 99 times in 100 on the average. : 

There is a point of interest to note here. In Example 21.1 we considered a hypothesis 
as to the mean p, and in the present example a hypothesis as to the variance c?. Had we 
considered the two together, that is to say the compound hypothesis that u = 16 and 
5? = 0-01, we should have been in difficulties in justifying our procedure by reference to 
confidence or fiducial intervals, since we could no longer assert that our conclusions were 
right in an assigned proportion of cases. We have avoided this complication by con- 
sidering separately the hypotheses (a) that u = 16 whatever the variance, amd (b) that 
o? = 0-01 whatever the mean. This resource is not as a rule open to us where non-normal 
variation is concerned. 


Tests of Normality 


21.18. In large samples we can group the data into ranges and compare the actual 
frequencies with those to be expected on the hypothesis of parent normality. This com- 
parison over the course of the frequency function is not satisfactory for small samples 
unless the grouping is so broad as to deprive the test of most of its efficacy. An alter- 
native is to compute some statistic of the sample and to examine how far it departs from 
the mean value to be expected on the hypothesis of parent normality. 

Consider, for instance, the statistic 


t E . : . . . (21.14) 


This is independent of the mean (because the k-statistics are so) and is also independent 
of the scale parameter because it is “ studentised”. In normal samples, therefore, the 
distribution of t is independent of mean and variance and thus depends only on the sample 
number n. We have already given formulae for its mean and variance (Exercise 11.16, 
vol. I, p. 289). In fact, 


uy (t) = us (t) = 9 
1) 6n (n — 1) . . (21.15) 
ml) = 3 (n +1) (w+ 3) 


Since the distribution of ¢ is symmetrical we may, for moderate n, consider it as normally 
distributed with zero mean and variance given by (21.15), and this will provide a test— 
of a somewhat approximate kind—of normality in the parent from which the sample is 
derived. 


Example 21.3 
In the data of Examples 21.1 and 21.2 we have, for the sample moments about origin 
16, in units of 0-1 i 
m, = — 0:8 
M, = 2:16 
qma = 0:496 
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whence [ow n m, = 2:31429 

n —1 

n? 
= 0-61319 
== 1) hese) wm 

and (== 10-174; 

Ia. 


The variance of t, from (21.15), is 0:3188 and its standard error accordingly about 
0:57. The observed deviation from zero is considerably less than this, and we see no reason 
to doubt the hypothesis of normality so far as this test is concerned. 


21.19. Another test of normality has been proposed by Geary (19354), namely 
the use of the ratio 

., mean deviation . . (21.10) 

standard deviation 


9 
If the parent mean is zero, the parent value of w is xi = = 0-79788. The test has also 
. n 


been adapted to the case when the parent mean is not zero, and tables provided for the 
application of the test (Geary and Pearson, 1938). 

Geary’s ratio is directed towards detecting deviations from mesokurtosis in the parent 
The criterion based on k,/k3, which is a natural extension of that for skewness based on 
k,/k,}, is not very suitable for the purpose, since it has a skew distribution for quite high 
values of m. The distribution of Geary’s ratio tends to normality fairly rapidly 
(cf. Exercise 21.2). 


Tests of Goodness of Fit 

21.20. In Chapter 12 we considered in some detail the use of y? in testing corre- 
spondence between observation and hypothesis. If the hypothesis specifies the theoretical 
values completely no question of estimation arises, and each cell contributing to y? could, 
if so desired, be tested separately. From this point of view 7? compounds into a single 
test a number of tests of the kind already considered. 

If the hypothesis does not specify the theoretical values completely, but leaves them 
to be estimated in part from the data, some modification in the 7?-test is necessary. We 


+ can now establish a result which in 12.13 was announced without proof: if the estimators 


employed are maximum likelihood estimators, then for large samples the 7?-test of signifi- 
cance retains its validity, provided that the number of degrees of freedom is reduced by 
unity for every parameter estimated. 

Suppose the hypothesis leaves unspecified a parameter 0, and let ¢ be its maximum 
likelihood estimator. Then if the theoretical frequencies based on the true value of 0 
are 4 and those based on t are 4’, we may write 


(L— 2) 
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x? is distributed as the sum of squares of » normal variates with unit variance. The problem 
is to find the distribution of 7'. We have 
Vatt 
Csr) fete iy Ly [as 
SA í "5 2) 
and for large samples the difference between / and 4' will be of order n~}. We then have, 
expanding the difference in terms of 60, to order n~}, 
Tae TEA E 2a OAN E EDAN (D)? 
i e 4 Fal 2) ys at EE e E 
Now for large samples the maximisation of the likelihood is equivalent to minimising %?, 


and hence 
72 aa’ 
z ( E 5) aa 


rg (00)? [2/01 NS. aw 
ee 21a 5) - 
1/0 N* 
- e» z (2 (55) }. Sop eee DER) 


But the sum on the right is the reciprocal of the variance of the maximum likelihood esti- 
mator, and writing dt for 60, as is legitimate for large samples, we have 
evan umo 

Ded dd tor LE Ue nes a a S STRA OLD 
The quantity on the right is itself the square of a variate which (in the limit) is normal 
and has unit variance. Furthermore, its distribution is independent of that of y'*. For 
consider the spherically symmetric density-distribution of the » normal variables whose 
sum of squares composes 72. Let O be the origin and P any point; then y? = OP*. Now 
for large samples the variation takes place in the neighbourhood of O. A surface of con- 
stant ¢ through P is approximately plane in the effective range of variation. If OQ is the 
normal to this surface, 


and 


OP? = 0Q? + PQ*, 
corresponding to 


for tis chosen so as to minimise 7’? = PQ®. Thus if we take t as a new co-ordinate, together 
with (v — 1) others in the surface of constant t, the axis of ¢ is orthogonal to the space of 
constant f, and ¢ will be independent of 7'?. 

It follows further that y’? is distributed as the sum of (v — 1) squares of normal 
variates. ‘Thus the usual Type III distribution of 7? holds for » — 1 degrees of freedom ; 
and so for every constant fitted, with a reduction of unity in the number of degrees for 
each constant. We have already exemplified the use of the result in Example 12.4 (Vol. I, 


p. 301). 


The c*-distribution 

21.21. For small samples the z?-test is difficult to apply, since it depends for its 
validity on the fact that the binomial distribution in individual cells may be represented 
by the normal distribution, and hence requires that cell-frequencies shall not be small. 
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A test of a different kind has been proposed by Cramér (1928) and independently by von 
Mises (1931). 
Put 


of {F (x) — F(z) }*dz, . z : . (21.22) 


where F (x) is the observed distribution function and F (x) the hypothetical distribution 
function. The quantity w? varies from sample to sample, its mean value being 


Inf 1 
E (o?) = x] Feo -FG)é-i4,. . — (21.28) 
where 4, is Gini’s coefficient of mean difference (cf. 2.24). For 
E (09 =f E(F — F} dz. 
For any given x the expectation of (F — F)? is merely the variance of the proportion F 
E (1 =E) ) 
"n 


and hence is equal to The result (21.23) follows at once. 


The w?-test consists of comparing the observed with the mean value; but it is not 
possible to express the comparison in terms of probability as the sampling distribution 
of o* is not known. 


21.22. The numerical evaluation of the integral (21.22) is tedious in the case of a 
continuous distribution, and Wold (1938a) has suggested a modification. If the variate 
range is divided into intervals at — oo, a, 24 . . . 2; . . . co, we define 


w-—X(FGg)-FQ)]B. . 3 : . (21.24) 
j 
If the intervals are all of width A, 
TL 1 
RY nS = =a 
Ew = | ren F(x) }de +R, .  .  . (21.25) 


where R/,, is a remainder term. If this may be neglected, the w?-test is equivalent to the 
w?-test but easier to apply. If the data are ungrouped, the 2,’s may be taken at equidistant 
intervals. 

In the particular case when F is Here we eo 


E (w*) = io 2n 
n E (o?) We Ins Js “we Van Ve du dv dx. . . (21.26) 
Putting u = « + x and v= B + x, we find, after integration with respect to x, 


zu. ik exp ( — 1 (a — f)*}dadp. 
A further substitution of y = « — f and 6=« + gives 
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21.23. An interesting modification of the w?-test has been given by Smirnoff (1936) 
who defines 


a=] T nsgp ae oe ees ae. L328) 


The difference lies in the differential element which has the effect of rendering 
the distribution of o? independent of F. It is shown that as n tends to infinity the distri- 
bution function of œ? tends to the form 


2 kn —4ztot, 
s > f e a A ACE 
m (k-1n V(— & sin 2) 


k=l 
but this does not look a very promising formula for application in particular cases. 
Cramér (1928) has extended formula (21.27) to the goodness of fit of Gram-Charlier 
series and gives some examples of fitting to observed distributions. 


Difference of Two Means 

21.24. A common case occurring in practice is that of two independent samples of 
n, and n, members from two populations which may or may not be different. We wish 
to decide whether the evidence indicates a significant difference between the parent means. 
This situation forms a kind of border-line case between the testing of a prior value of a 
parameter and the homogeneity tests which we shall consider below. It is a test of homo- 
geneity in the sense that we are to discuss the question whether two populations are equal 
in certain respects ; but we do not necessarily assume that they are identical, and in any 
case we can regard the problem as equivalent to the testing of a single parameter (the 
difference of the means) to see whether it is different from zero. 


21.25. For large samples we discussed the question in Example 9.10 (Vol. I, p. 226) 
and gave two tests. If the hypothesis is that the parent populations are identical (a true 
hypothesis of homogeneity) we may pool the samples to form a single sample and test 
whether either mean differs from the mean of the total. If, however, we wish to test the 
less general hypothesis that the parents have the same mean but not necessarily the same 
variance, we may test the difference of means by the ordinary equation expressing the 
variance of a difference in terms of the separate variances. This is not a homogeneity test 
in the strictest sense of the word, but tests of such a character may conveniently be dis- 
cussed in conjunction with the other type, both for small and for large samples. 


21.26. We now consider the corresponding problem when the samples are small 
and the parent populations are assumed to be normal. In the first place we take the 


case when the two populations have the same variance o*. 
2 


M : : g? o 
The sample means #, and #, are distributed normally with variances = and x and 
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means u, and us. Consequently mora e x — Ms) is distributed normally with variance 
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is distributed normally with unit variance about zero mean. Further, if S? and S} are 
the sample sums of squares about the mean, the quantity 
EUNT SG MTS MTS . (21.31) 
CDU EO 
is distributed as 7? with n, + m, — 2 degrees of freedom, independently of the expression 
(21.30). It follows that 
di x — (ux — uz) Ny Na (Ni + Nz — 2) 91.39 
C (acm) f mmm D). aan 
is distributed like “ Student's " ¢ with » = n, + 1, — 2 degrees of freedom. This expres- 
sion does not contain the unknown o and hence may be used to test the difference y, — He 
This result is due to Fisher (19264). 


‘Example 214 
In a class of 20 children, 10 chosen at random were given a ration of orange-juice 
each day for a certain period and the other 10 a ration of milk, Their gains in weight 
during the period were, in pounds :— 
First group : 4, 23, 33, 4, 14, 1, 3}, 3, 24, 34 
Second group: 14, 33, 24, 3, 2}, 2, 2, 23, 15 3 


. ~The mean increase in the first group is 2-9 pounds, and in the second 2-4 pounds. Putting 


" aside other explanations, one possible factor accounting for this difference is the difference 


in treatments. But we wish to know in the first place whether this is significant. We 
assume, then, that treatment exerted no differential effect and that the samples came 
from normal populations with the same mean and variance. We. find 
. d,—29 dy = 2-4 
; E(n—4)-94 E (ta — 4)? = 3:9. 
‘Hence, from (21.32), with yı — us = 9, 
y:210 10 — 2 18 
0:5 100 


-——— 1 —— = 1:30. 
visa V'S 20 E^ 


From Appendix Table 3 (vol. I, p. 441) we see that such a value would be exceeded in 
absolute value with probability 0-21. The difference of a half-pound between the sample 
means is not significant. 

We note incidentally that the sample variances, 0:940 and 0-390, differ considerably, 
and shall see below how the significance of the difference may be tested. At the present 
stage our conclusion as to the non-significance of the difference of means is to be regarded 
with reserve, for the data themselves suggest that we have over-simplified the problem 
in assuming equal variance in the two populations. 


= ab 


21.27. Apart from the question of unequal variances, the data of the previous 


' example will serve to illustrate a further point of interest. Our hypothesis is that the 


children within each group may be regarded as a sample from a population with the same 
mean, Had we been dealing with a sample of, say, seedlings grown from the seed of a 
single plant, this hypothesis would not have been unreasonable ; but children differ very 
much among themselves in nutritional standard, and so forth. Our hypothesis is again 


liable to over-simplify the problem. 


a 


3 


" - 
- ay 
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When the statistician can direct the sampling himself, this kind of problem can be 
tackled with success by pairing. Suppose we select children in pairs of the same sex, 
each pair resembling each other as closely as possible in all the factors which might influence 
the experiment such as age, weight and nutritional standard. We allot at random one 
member to the first group and one to the second, and so for each pair. The differences 
in weights gained between members of a pair may then be regarded as samples from 
a population with zero mean, even if the pairs differ among uoc ET and the set of 
differences tested in the usual way. 


Example 21.5 


Suppose that, in the previous example, the data had related to 10 pairs of children, 
thus ;— 


| 
: First Grow Second Group Difference, 
Nos oh Esurs | wt. in bs wt. in lbs. First — Second. 
1 4 1} 24 
2 2 3} =I 
3 3} 2) 1 
4 4 3 1 
5 1 2} =I 
6 1 2 -1 
7 3h 2 m 
8 3 21 + 
9 21 1 1 
10 | 3} 3 1 
| 
TTorArs | 29 24 5 4 


For the values in the last column we find 


B= 05 $8? = 1:25 »—9 
0:5 
PESCE 01:94: 
Vr35 v 


The probability of obtaining such a value or greater (absolutely) is about 0-22, and 
the observed differences are therefore not significant. This is the same conclusion that 
we reached in Example 21.3, but it would not have been surprising had the conclusions 
differed, for they relate to different questions. 


Difference of Means when Variances are Unequal 
21.28. When population variances are not assumed equal the t-test of difference 
of means no longer applies. We can, if we choose, apply a test based on fiducial intervals, 
namely, the Behrens test, considered in the previous chapter. We put 
Cae ue SUB 3057. CFT: 
Veg + ap) oe 


The fiducial limits of d for various significance levels have been tabulated by Sukhatme 
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(1938b) and Fisher (1941a) for n, and n, greater than 5. If the observed d falls inside the 
range, we may accept the hypothesis that the population means are equal. 


21.29. As we have seen, an inference of this kind does not imply that we shall be 
correct in a certain proportion of the cases, and if we wish to find a test satisfying such 
a criterion a different approach is necessary. The following investigation is due to Welch 
(1938b). 

Consider the distribution of u of equation (21.32) when the means are the same but 
the variances are different, i.e. 

u o or E RES. een 001.54) 


2 ene OT RE RT, 
Ny m — ne e ra 


Í . (21.35) 


ee 3X +} a ), . . (21.36) 
(n4 + m4 — 2) (44 +2) Na 


where of 41 = Si qus hence z? is distributed as y? with v, = n, — 1 degrees of freedom, 
and similarly Lx Xi. x' may be regarded as a single normal variate with zero mean and 
unit variance, We have then 


up eee T... (21.81) 


vw 
Now put 
w = ay, + by, HEAD aes” (21558) 
where, from (21.36), 
12501 
2 E C. 
» br A m" 
ni Hn —2 oi D 
Ni Ms, 3 
5 B . .. (21.38 
- (21.39) 
o3 Mm m 


m+n,—2 ol o 
M Ms 


w itself is not distributed in the Type III form unless o, = o2, but we will find a distribution 
of that form which approximates to it by equating lower moments. The first two moments 
of w, being the sum of the separate parts, are 


pı (w) = av, + br. 
H: (w) = 2 (a? v, + b? m s ; , SOLS) 
The moments of 


1 
dF = l wP- ¢-w/20 dw 
(2gy* T (àv) 
are ni — gr 
; isa sia SARS se tema: (21.41) 
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Identifying (21.40) and (21.41) we fina— 


03 y, +b? v, 


avı + bv, (21.42) 
+ (w+ bva)? 7 3 5 i 3 


a? v, + bv, 


g 


With these values of g and v the distribution of w/g is approximately of the Type III form 
with v degrees of freedom and will be independent of y’. Hence, 


xw» fm 
g 
= uy (g) . E G . š . (21.43) 


is distributed approximately as “ Student’s ” ¢ with v degrees of freedom. In particular, 
if o, = o,, a =b and we reduce to the test of 21.26. 


21.30. In general, when o, =c, the quantities g and v depend on the ratio 
0 = oi/oj. We have 
PUN (0 + 7)? 
MP rrr Poet et a aes Ai lef OAA) 
and may put u = ct where c = 1/4/»g, and hence 
0 1\)+ 
(vı + vi) (s *z) 
n Ns 


E (s HE) (00+ m 


T, 


Med papas (21725) 


Without a definite knowledge of 0 we cannot apply the t-test, but the advantage of putting 
the expressions in this form is that by considering particular values of 0 we are able to 
judge how far the test based on “ Student/s " distribution is likely to be affected. 


Example 21.6 (from Welch, 19380) 
Consider the case nı = n, = 10. From (21.45) we have c = 1 and from (21.44) 


9 (0 + 1)? 

[S 
Suppose now we were to use the test of 21.26, based on the assumption that 0 = 1. We 
should find, to a probability level of 0-05, that | | must exceed 2-101 to be significant. 
If we judge u significant for such values how far are we in error when 0 is not unity ? That 
is to say, what are the true probabilities that 


P {|u| > 2101} 


for varying values of 0, as compared with our value of 0-05 ? 
For a specified 0 the probabilities can easily be obtained from the approximate dis- 
tribution &4/(g») of equation (21.43). They are shown graphically in Fig. 21.1. The full. 
line (a) shows P for various values of 0 and n, = n, = 10. The full line (b) shows similarly 
the values for n, = 5, na = 15. (The dotted line (c) we refer to below.) 
A.S.—VOL. II. 


I 
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In case (a) the line 
does not deviate very 
much from the horizontal 
at P — 0-05, and we may 
conclude that the test 
based on the assumption 
of equal variance is not 02 
very much in error. In 
any case, if the curve Values 
falls below the line P — ofP 
0-05 we are on the safe 
side, for our true proba- ol 
bility is then less than 
0-05, and in rejecting the 
hypothesis at that level 005 
we are adopting “more 
stringent standards than 


: 0:0 

is apparent. 0-01 0:10 ro 10 /0C 
In case (b), when the Values of O (logarithmic scale). 

sample numbers are un- Fic. 21.1. 


equal we have a different 
state of affairs. For 0 < 1 the test is very conservative, but for 0 > 1 it may err very 


seriously in the wrong direction. 


21.31. Welch concludes that for samples of equal size there is not a serious likeli- 
hood of error in testing the difference of means as if the parent variances were equal. For 
samples of unequal size the error may invalidate the t-test and an alternative criterion is 


proposed. Write 


{ Si Si y: ED S OAD) 


n; (n, — 1) Y (m, — 1) 
Here, it will be observed, the denominator is an estimate of (2 d- ay, the standard 
ME n 
deviation of the difference 2, — z,. Precisely as for u we approximate to the distribution 
of this denominator by a Type III form. Corresponding to (21.39) we find 


oi quas 
a= 1 2 
ape y/ à x) é (21.47) 


pe on /@ L 
Na (na — 1) Ny . Ne 


Corresponding to (21.45) we find c = 1, and to (21.44) 


Exi Gap 1 
"Es (Gaon: zw-p) | 0049 


v is then distributed approximately tn “ Students " form with v degrees of freedom. The 
dotted line (c) in Fig. 21.1 shows the relationship between 0 and P {|v| > 2-101} for 
n, = 5, na = 15. Clearly the error is now much smaller than when we used v for the same 


sample numbers. 
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Difference of Two Variances in Normal Samples 
21.32. If we have samples of n, and m, members from normal populations with 


: z : : Eid ipee : 
variances o? and o3, the ratio of sample variances p? = 2 is distributed in the form (ef. 
2 


Example 10.18, vol. I, p. 249)— 


d eta DEC Up e (21.49) 
mp? , n, M ntm 2 E T 3 oM LU 
(E ile =) 
oi gx 


The related quantity 


ve Ny (ns =1) 2 9 
2 — 1 lo i, rape D? ü 5 à j . (21.50) 


is distributed in Fisher's form 
TT uc rs i. CaL me cap CAES) 


ye p\in 
Bee 
91 95 


where v, = n, — l, v, — n4 — 1. The »’s may, by a convenient extension of our previous 
terminology, be called the degrees of freedom associated with z. In practice, z is generally 
used in preference to p, but tables of both are available. 

These distributions provide a test of significance of the equality of the ratio o? /o%. 
On the hypothesis of equality they are independent of the ratio and the probability of 
an observed p or z can be obtained. As usual, if this is small we reject the hypothesis. 
We leave it to the reader to show that this type of inference can be based on the theory 
of confidence intervals or the theory of fiducial intervals in the usual way. 


Example 21.7 
In Example 21.4 we had two samples of children and found that the difference in 
means was not significant. "This was on the hypothesis that the variances were identical, 
and since the two samples are equal in number the inference remains valid even if the 
variances are different, as illustrated in 21.31. We will now test directly whether the 
sample variances themselves indicate any significant difference in parent variances. 
We have 
E(z—4)) =940 »,—9 
E (£a — a)? —390 »,—9. 
Hence 
z = $ log, EHE OEO rer 
9 9 
From Appendix Tables 4 and 5 of Vol. I (pp. 442-3) we see that for v, = 9 the 5-per-cent 
points of z are 
vı = 8, 05862 
v, = 12, 0:5613 
and the l-per-cent. points are 
8, 0:8494 
vı = 12, 0:8157., 
Thus, notwithstanding that one variance is about 2} times the other, the probability that 
the observed z will be exceeded on random sampling from populations with the same 
variance is greater than 0-05, and the difference of sample variances is not significant. 


5. 
Il 
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There is a point here which is frequently overlooked. In carrying out the z-test we 
always take the ratio of the larger variance to the smaller, so that our probability levels 
relate, not to the chance that a given pair of variances have a larger ratio than the observed 
one, but to the chance that the bigger of the two exceeds the smaller in a certain ratio, 
A probability of 0-05 thus relates to the chance that either s?/s2 exceeds a given amount 
k, or s?/s falls short of a given amount 1/k. If we are interested only in the former 
contingency our probabilities should be halved. 


Properties of Fisher's Distribution 

21.33. The z-distribution plays a very important part in statistical inference based 
on small samples, and we digress at this point to give an account of its main features. 

The distribution function of z may be obtained from the incomplete B-function, for 
z may be easily transformed into a Type I variate. There are, however, special tables 
for lower values of v», and v, and satisfactory approximations of various kinds for higher 
values. 

The characteristic function of z is proportional to 


w gt): dz 
E (v, € + vai ctr 
where( = it, and is thus 


$0 -(5)" dice P eux) 
eG) 


Thus, taking logarithms and using the expansion 
log T (1 +a) = } log 2x + (œ + Hloge —2 +55. — IT 


bs $0 = — 5 (5 -)5G + j- JE om 


v n r 


we find 


Thus, for large v, and v, z is distributed normally with mean 


-(z -i) and variance 1 ( *z) 


Vy Ya 


21.34. Various approximations have been given for the case when v», and v, are 
not large enough to justify the assumption of normality. 

(a) (Cornish and Fisher, 1937). The method is that of 6.32 and depends on the 
expansion of the distribution in a Gram-Charlier series. From the successive derivatives 
of log I’ (1 + x) we can find those of $ (t), and hence ascertain the cumulants of z. Writing 


1 1 
free and fs =~, We find 


k= —4(n-r)-ài0i — r?) 

pore UR BS eir) 

Ks y T5) — (r r 

E CEN 7 yc ee ED 
ks = — 3 (ri — ri) 


ks = 12 (rå + r$) 
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Hence, putting o = rı +r, and ô =r, — ra» we find for the ls of 6.32 (m — 0, 
variance = 4¢)— 


E 
n= [2494 400) 


b=4(o+5) 4 4 (o 4-305, 


and soon. After some reduction we find, for the value of z corresponding to a probability 
« (whieh in turn corresponds to a normal deviate £),— 


c 1 ô? 
n. ewe tas [zem tee ng) 
= ÒO (24 4 9e? 8) + (385 4 TE + 16) + NE 9*  (g 4. 208 + 15£) 
120 2 32400 211920 * s 


E (s 4 ages + 1835) + 
2880 15552002 


(968 ss — 15188) } . (21.55) 


(b) (Fisher, extended by Cochran, 1940a). Writing n indifferently for v, and v, we 
have, from (21.55), to order n-?— 


o ofo 1702 
N pens fe ETET 
Put h = 2/c. Then 


5 2 Te fine? nesta BES 
z= 7 45 (E? + 2) + =a ye ai n). . (21.56) 
z é AE 2 
Now eae + hah + 0 (n3). 
Hence, if we put 
= NA 3X6» 
De i5 (0171-2) PM TES (21.57) 


the difference of this quantity from (21.56) is 


(3 + 11) 0 
144 A 


243 
provided that we take 4 = , 
CENT 
which is small if v, and v, are not too different. Thus we may take z as approximately 
given by (21.57). The values of 4 for various values of the significance level are 


Level 40% 30% 20% 10% D95 1% 0-1% 
A 0-51 0-55 0-62 0-77 0:95 1-40 2-09 


l INA 
The difference is small in virtue of the large denominator and the factor 6? = C -) 
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For the commoner levels of significance the form taken by (21.57) is 

0-8416 

= abla aoe Nes . (21.58 
v — 2) ji LS 


1-6449 
5 per cent. level; TEA 


20 per cent. level : 
Eeoa i (21.59) 


2.3263 

vh — 4) 
3.0902 

0-1 per cent. level : Vv E 


24, va = 60 may be judged from the following 


1 per cent. level : — 1-235. à à . (21.60) 


— 1-9256. . . . (21.61) 


The accuracy of the approximation for v», = 
comparison :— 


Level Value of z from -E Xn 
per cent. (21.51). Exact Value. 
20 0-1337 0:1338 
1 0:3748 0:3746 
01 0-4966 0-4955 


(c) (Paulson, 1942). The Wilson-Hilferty approximation to z* of 12.7 indicates that 


phV ol DE : 2 Perot, 
a is distributed normally about mean 1 — 9° with variance >. The ratio a itself 
dv S2 
and v, degrees of free- 
dom. Further, in virtue of Geary's theorem (Vol. I, p. 253) the ratio T Eon s 


is the ratio of two independent quantities distributed as y? with », 


normally distributed in standard measure. 


We may thus regard 
C-8y-C-& 
KLEY $a 9v, 
eum 3 s \! + 2)! 
9v, \ 82 9v, 
as approximately normally distributed in standard measure. The approximation seems 
remarkably good. For instance, the following shows the exact and approximate values 


of p? for v, = 6, v, = 12. 


. (21.62) 


= p*, from | 


8, 2 
Level + 
per cent. Sa (21.62) Exact Value. 
20 1-72 1-72 
5 3-00 3-00 
1 y 4-85 4-82 
0-1 8-58 8-38 
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The Problem of k Samples 

21.35. We now proceed to consider the case when we have samples from k different 
populations and wish to determine whether there is any evidence of significant differences 
between those populations. In some cases the appropriate test can be carried out by the 
j?-distribution, particularly if the data are grouped. For the groups may then be regarded 
as determining the rows of a contingency table and the different samples the columns, and 
2 homogeneity test applied to the table in the manner of Chapter 12. Again, we may 
compare the samples pair by pair by the foregoing methods; but this, apart from being 
tedious, does not give us what we want, namely a test of homogeneity of the set of samples 
taken together. 


21.36. Consider in the first instance the sampling of attributes. Suppose we have 
samples from populations in which the true proportions of successes'are w, the observed 
proportions being p, . . . py and the sample numbers nı . . . nw, totalling m. 

If p is the mean proportion of successes in all samples taken together, and our hypothesis 
is that the populations have a common value, p will be an estimate of and we have for 
the variance of p,— 


DY 
var Dj =— 
4) 
EM approximately, i % ^ . (21.63) 
us) 
1 
where p= PE / Dre 


It follows that (p; — p) Ja will be distributed normally about zero mean with unit 


variance, and hence 


= Zn (py — 2?) 
pq 


epee ie (oto) 


in the Type III form with k — 1 degrees of freedom (not Æ because we have lost a degree 
by estimating p). Hence the ratio 


A 2 ODE D Se Cane ik, Meteo Ca BB 
Ot = pL — 1) cun 


has expectation unity. The quantity Q is called the Lexis ratio, after the author who 
first discussed it in detail (Lexis, 1903).* 


* Lexis first developed the use of Q in a paper “ Über die Theorie der Stabilität statistischer Reihen," 
1879, Conrad's Jahrbücher, 32, 60, reproduced in the reference given above. He dealt, however, only 
with the case when all the n's were equal and had no knowledge of the sampling distribution of Q. In 
practical applications he took as each n; the average for the group. “ Den dadurch begangenen Fehler 
kann man beurteilen wenn man n einmal mit der gróssten und einmal mit der kleinsten Grundzahl 


berechnet." 
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"Example 21.8 
From 1910 to 1919 the numbers of live male and female births in England and Wales 


were as follows :— 


Year. Male Births. | Female Births. | Total Births. d 
Male/Total. 
1910 457,266 439,696 896,962 

1911 448,933 432,205 881,138 5095 
1912 445,004 427,733 872,737 0-5099 
1913 449,159 432,731 881,890 0:5093 
1914 447,184 431,912 879,096 0:5087 
1915 415,205 399,409 814,614 0:5097 
1916 402,137 383,383 185,520 0-5119 
1917 341,361 326,985 668,346 0-5108 
1918 339,112 323,549 662,661 0:5117 
1919 356,241 336,197 692,438 0:5145 
TorALS 4,101,602 3,933,800 | 8,035,402 0:5104 


The proportion of male births showed an increase during the war years 1916-1919. 
This is a well-known effect of war, but suppose we had noticed it here for the first time. 
The natural question is : can the effect be accidental ? There is no doubt about its reality, 
for the data cover the whole population ; but if we suppose that sex at birth is distributed 
according to the laws of chance, do the differences observed suggest that in the ten years 
concerned there was a significant change in the population (as regards proportion of male 
births)? Let us consider the homogeneity test applied to the 10 proportions. 

We have p = 0:5104, n = 8,035,402, k — 1 = v = 9 and the sum Zn; (p; — p)* will 
be found to be 19-895,783. Hence 


Q- J 19-895,783 YA 


9 x 0-5104 x 0-4896 
x? = (k — 1) Q? = 79618. 
Q is sufficiently far from unity to reject decisively the hypothesis that the data are homo- 
geneous. A j?-test will confirm the conclusion. We infer that, whatever the reason, 
the differences in proportions of male births, slight as they are, cannot be accounted for 
on the supposition that the distribution of sex is according to chance in samples from 
a constant population. We may observe that, had we obtained the same proportions 
for a sample one-tenth the size, 7° would have been 7-962 and we should not have inferred 
non-homogeneity. 
21.37. A similar test may be applied with E samples of variables. Let the samples be 
Ziv Xin o. + - ın, With mean d, 
Tais Vim + + Von, 0» "E^ 


Tris Cy + + + Ykng » » fie 
The variance of the jth sample is 
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and an estimate of the population variance may be obtained by taking the weighted mean 


of sample variances 
1 


n— 
Here we have reduced the divisor tom — k so as to correspond with the number of degrees 
of freedom. 


Ces 


EZ (zy — 2). TCU ns Used sie (9186) 
3 t1 


2 
Furthermore z, will be distributed with variance a and hence (assuming without 
j 
loss of generality that the parent mean is zero), 
k 
E Y {my (& — 2?) = {E (m) — E (n2) } 


j=1 


= ko? — o? 
= (k —1) o°. 
Putting then 
1 t = 
= ca Ai (z; — &)*, * > x . (21.67) 


we have another estimate of c?. Within sampling limits s, and s, should be equal, If 
they are not, we suspect the homogeneity of the population. 


21.38. The above test is a simple form of the analysis of variance, which we shall 
study extensively in Chapters 23 and 24; it is therefore unnecessary for us to develop it 
further at the present stage. Essentially the test is one of simultaneous significance of 
differences between means on the assumption that variances are constant. We shall also 
discuss in Chapter 26 a generalisation of the variance ratio for testing the homogeneity 


of a set of variances. 


Example 21.9 T 

The following table (from the Registrar-General's Statistical Review of England and 
Wales for 1933, Part II) shows the numbers of males married in England in that year 
classified according to age and district. (Certain small numbers of unspecified age and 
those under 21 have been omitted.) 


Age (Years). 
YE SF a z a || Toran, 
ges 21- 25- 30- 35- 45- 55- 

South-East - ; 31,714 43,979 14,995 7,985 3,928 3,717 100,318 
North 0 31,507 39,849 13,620 7,108 3,362 2,916 98,362 
Midland . . . 17,465 21,486 6,729 3,340 1,624 1,509 52,153 
East ia es 4,016 5,297 1,820 962 457 386 12,938 
South-West . . 4,323 6,065 2,218 1,177 514 580 14,877 
TOTALS 89,025 116,676 39,382 20,572 9,885 9,108 284,648 


Note the changes in interval at 25- and 35- years. 
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The question we shall consider is whether age at marriage differs significantly between 
different districts. This might, for example, be an important point if we were about to 
sample the population for some quality related to age at marriage, such as the number 
of children per family. The data might be regarded as a contingency table and y? used 
as a test of independence in the usual way. Here we adopt an alternative by considering 
the mean age at marriage in the five different districts. 

Taking the centres of the intervals to be 23, 27-5, 32-5, 40, 50 and 57:5 years (the latter 
being admittedly an approximation) and making no corrections for grouping, we find :— 


Moan Sum of Squares 
District. Number. (years) of Deviations Variance. 
eor): from Mean. 
South-East. > . uos - 106,318 | 29-681,799 7,092,490 66-710 
Nop aca mo VE. 98,362 29-312,626 6,092,375 61-938 
Midland ZUR mele noe. re 52,153 29-007,344 3,105,520 59-546 
East Sere Gh eer rer cM | 12,938 29-425,761 807,911 62-445 
South-West. . = 5 . « 14,877 | 29-873,731 1,025,284 68-917 
Whole population . . . 284,648 | 29-429,049 18,143,921 | 63-741 
| 


The total of the sum of squares about district means, X (x; — 7;)?, is the sum of the 
figures in the fourth column, namely 18,123,580. The sum of squares £»;(z, — à)* is 
found to be 20,341. We have the useful check that these two together are equal to the 
sum of squares of deviations from the population mean, 18,143,921 (a property which we 
shall often require in the analysis of variance). 


Thus 
EO 15,128,590 6867 
284,648 
sg = 20041. L 5085:25. 


No test of significance is required to see that the difference in mean age at marriage between 
districts is not a chance effect. 


Tests of Random Order 

21.39. The tests described above are concerned with the values of a number of 
sample members but not with the order in which these values occur. Sometimes there 
may not be an order, as, for instance, if a number of plants are grown simultaneously or 
a number of ħames drawn from a hat in a single handful. More frequently there is a tem- 
poral order of appearance in the values, and it is clear that, on some occasions at least, 
the order may be material. To take an extreme case, suppose we are told that in a sample 
of 100 births 53 are male. We conclude that the sample is concordant with the hypothises 
that male and female births occur at random with probability $. But if we knew in addition 
that the first 53 births were male and the next 47 female we should almost certainly reject 


the hypothesis. 


21.40. If sampling is conducted by taking members one at a time from a population 
and the process is random, then any order is as probable as any other order. The sample 


^* 
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may be considered as a section of an infinite series generated by the sampling process, and 
this series ought to behave like von Mises’ Irregular Kollektiv (7.15). It is a happy 
hunting-ground for the theorist, since there is no limit to the number of tests which can 
be invented to ascertain whether a given finite series conforms to the random scheme. We 
have considered a few such tests in connection with random sampling numbers (8.15) 
and shall discuss others in connection with time-series (Chapter 30). Here we discuss a 
few tests which are useful in detecting departures from randomness in the sampling. We 
are not now considering hypotheses as to the parent population, but since the randomness 
of the sampling is an essential element of inferences in probability it is convenient to 
consider the reliability of the sampling, together with inferences from the sample about 
the parent. f 


Ranking Tests 


21.41. Suppose we have a sample of n members z, . . . z,, in that order, and are 
doubtful about its randomness. Such doubts may arise owing either to defects in the 
sampling or to possible alterations in the population while the sampling is going on. In 
the first case the process itself is at fault ; in the second, circumstances are at work to make 
the sample something other than it purports to be, a random sample from a single popula- 
tion. Either influence may relate the magnitude of the z's to the order in which they 
occur, and the values x, . . . 2, are not then a random order in the sense that any other 
order was equally probable. 

Let us then consider all the possible orders, n ! in number, of the observed values 
Zi... X, A proportion of these, determined by a significance level of 5 per cent. or 
1 per cent., say, we will decide to reject as improbable ; and we will select as the “ improb- 
able? rankings those which exhibit the systematic appearance of which we are afraid, 
and particularly the regular rise or fall from æ, to x, in magnitude. In short, we rank the 
sample in order of magnitude, say X, . .. X,, where the X's are a permutation of 
the first n integers, and compute a rank correlation coefficient between this order and the 
order 1... m. If the coefficient is large in absolute value (“large " being determined 
by the significance level) we suspect the sample of being subject to systematic influences. 


Example. 21.10 


Thirty persons in the income group £1000-£1500 are asked to supply returns of their 
annual income for some purpose connected with taxation. It is intended to summarise 
their replies by a given date, but when that date arrives only 20 answers have been received, 
This is a frequent event in postal inquiries, even when the return is compulsory, and it 
has to be decided whether the 20 returns may be accepted as representative of the 30. 
There are prior reasons for suspecting that persons with bigger incomes may delay more 
than the others, partly because of difficulty in completing returns and partly because of 
a natural reluctance to part with information which may tell against them.*. We there- 
fore wish to ascertain from the 20 returns whether there is any evidence that persons with 
smaller incomes tend to submit returns earlier than those with larger incomes. - 

Suppose the 20 returns give incomes, in that order, of £ per annum : 1180, 1270, 1400, 


* This is an assumption for the purposes of the example and not intended as a statement about 
taxation returns in real life. 
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1090, 1190, 1250, 1170, 1300, 1290, 1310, 1280, 1350, 1320, 1380, 1420, 1390, 1470, 1360, 
1220, 1460. The ranking order is— 


No. of sample . 1 2 34567 8 9 10 ll 12 13 14 15 16 17 18 19 20 
Rank .......-- 3 1i Tul 498997 -10-9 "11 - 8. 13 12 15 18 16 20 1475 19 
2 0 


Difference. ..... —2—5—143105 1:43 TARI 1 3 0-3 414 1 
The sum of squares of differences is 508 and thus the Spearman coefficient of rank 
correlation between observed and natural order 1 . . . n is 
6 x 508 
mue» — 0-618. 
a 7980 


The probability of obtaining such a value or greater (16.18) may be found from “ Student's ” 
distribution by putting 
n — 2N 
t=p =) = 3:34 


1—p 
» = 18, 
and is found from Appendix Table 3, vol. I, to be about 0-004. The test confirms our 
suspicion that size of income is correlated with order of appearance, and if we intend to 
use the mean income of the 20 returns as an estimate of the income in the full 30 we must 
recognise that it may very well be an under-estimate. 


21.42. It will be noted in this example that we have made no assumption about 
the distribution of incomes in the sample or the population (the latter of which would 
certainly not be normal) and have used the sample values themselves without any reference 
to the question whether they were representative. This does not invalidate our inference, 
which is made within the population of samples obtained by permuting the observed values. 


(Cf. 17.44 and 17.45.) 


21.43. A second test of use in random series, particularly when it is suspected that 
cyclical effects are present, may be obtained by counting the occurrences of “ peaks " or 
“troughs ” in the series. A member is said to be a “ peak ” if it is greater than the two 
neighbouring members, and a “ trough " if it is less than those members. In either case 
it is a “turning-point ". The interval between turning-points is called a “ phase ". 

Three consecutive observations are required to define a turning-point. If the series 
is random the probability that any given three provides a turning-point is 3, for the values 
Zi, Us, 4, May occur in six orders and in only four is the greatest or least value the middle 
one. Ina series of N terms there are N — 2 sets of three, and hence the expected number 
of turning-points p is 

E(p)=3(N—2). . : : 5 . (21.68) 
The variance and higher moments of p are not so easy to determine. Like the ranking 
problems considered in Chapter 16 (to which the present problem is analogous), the dis- 
tributions resulting are rather complicated. We quote without proof the results ] 


- 16N — 29 
ya (p) = er : : E E . (21.69) 


Pee EU. RITE Fe RECHT 


448N? — 1976N + 2301. ASR (ee OTT) 
u (p) = 4725 
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As N tends to infinity the distribution tends to normality fairly rapidly, and p may, 
for finite N, be taken as normally distributed about mean $ (N — 2) with variance 
16N — 29 

DOE 


21.44. A further test may be derived from the distribution of phase lengths. The 


probability of a phase of length d in a series of d + 1 terms is clearly for only 


2 
@+ 1)? 
two of the possible permutations are favourable. In a series of length N there are 
N — d — 2 possible phases of length d, for d + 3 points are required to determine the 


phase. The probability of a phase d in d + 3 terms is 


ej ruo 1 lc di + 3d +1 (21.72) 
(4-1)! @+2)! (d-p2)41^ (4-p3)!] ^ 43)! P 
and hence the number of phases of length d is bed 


y 248 —4 — 2) (a? + 3d E 1) 
i (d + 3)! 


RETS) 


Now the number of possible phases is 
2N — 7 2 
! a 
wif 3 ty PEN pe S PN 
for there is one fewer phase than turning-points, 4 (N — 2) in number, and the whole 
series may be a phase, which accounts for the factor 2/N ! In practice this is negligible, 
and for the probability of a phase d in a series of N we then have (21.73) divided by (21.74), 
namely 


6 (d? + 3d + 1) (N — d — 2) 

CRE TATE d 

The moments of this distribution are easily obtained to a very close approximation. i 
For example, 


S CIN ECTREEE OT T5) 


a 6 N3, (y —4—2)(d* 4-34 4- 1) 
ui; (d) pups ar 
—g— A ((d 4-3) (d + 2)(@ +1) — 3 (d + 3) (d + 2) + 5 (d + 3) — 3} 


— à +3) MEE icin + 3) @ + 2) 
+ 13 (d + 3) — 9]/(d + 3)! 


6 3 5 3 
Ses z|o- aE eam a) 
EL mS 8 13 9 
@—1)!'d! (c1! i] @+2)! (443) 
N] 
Remembering the rapid convergence of n — to e, we may write this as , 


gi stor nei cie TEETE 


—e43(e—1)—8(e—2) 4-18(e —$) —9(e— £)1. 


faros SU ET D L ^ Ao Se avian T 13] 


2 
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Similarly we find 


= 3 n 4 E i — m. ~ 0:560. 2 
u: (d) = EN- p 21) N? + (4e — 17) N — (48e 140e + 14) } ~ 0-560. (21.77) 


21.45. In comparing observed distributions of phases with expected values the 
ordinary 72-test cannot be applied, because the probabilities of the events in a finite series 
are not independent. A test of significance has been derived by Wallis and Moore (1941), 
who consider a grouping into three categories, d = 1, d = 2 and d 23. They conclude 
that y? calculated from these three groups can be tested in the usual Type III form 
with v = 2} if 4*2 6:3. For lower values 27? can be tested in that form with » = 2. 

This test is independent of the law of distribution of the variables and is thus of general 
application. It has to be remembered, however, that generality in these matters may 
be offset by loss of sensitivity, and more searching tests may be required in certain cases. 


Example 21.11 


The following table shows the deviations from a moving nine-year average of potato 
yields in England and Wales for the years 1888-1935 (units are yth ton) :— 


Year. | Yield. Year. Yield. Year. Yield. Year. | Yield. 
| | *| 
| 
1888 —6 1900 =T 1912 -15T 1924 A 
89 FZP 0l +6 P 13 + 3P 295 | + 22 | 
90 -4T 02 -3 14 +2 230 |— 9T 
91 -3 03 —-T7T 15 +1 2 |-—-383 
92 -1 04 FAP 16 = 290 28 + 9P 
93 +6P 05 oT 17 +5P 29 +5 
94 ET 06 +1P 18 +4 30 + 1 
95 +7P 07 =T 19 = 47 31 OT. 
96 +3 08 +8P 20 Susy 32 mM 
97 —emT 09 +4 21 9m 33 + 2 
98 ee 10 FST 22 +11P 34 + 5 P 
99 0 1 +4P 23 — 1 ) 35 Ad 


We have marked with P and T the peaks and troughs of the series. The observed 
number of turning-points is 31 in a series of 48 terms. The expected number is, from 
(21.68), 3 (48 — 2) = 30-67, almost exactly the number observed. No test of significance 
is required. : 

The duration of phases is :— 


Observed Predicted (21.75) 
g = 5 . . . 24090 18-75 
2 : D . : : 6 8-07 
3 and over . . . : 4 3:18 
30 30:00 


Here, again, a test is hardly necessary. We find, in fact, y? = 0-826, $ of which for 
y — 2 is not significant. 

We conclude that these tests provide no evidence against the randomness of the series 
and hence do not suggest any cyclical movement in the yields. 


j 
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21.46. In the foregoing example we have treated the two values in 1923 and 1924 
as a single value since they are equal. These so-called “ ties ” frequently occur in ranking 
work and are a great nuisance. In the present case there is only one, and any reasonable 
method of treating it will not affect the test. Where “ ties” are numerous enough to 
make a serious difference some systematic method of treating them is desirable, particularly 
if more than two individuals are tied. They may be treated as a single observation, as 
in this case (although it would probably be better then to reduce N accordingly); or, 
preferably, they may be counted as a mean value, e.g. with a tied pair we should consider 
the first as greater than the second and then the second greater than the first, counting the 
number of turning-points or phases as one-half in each case and adding the two together. 
This, as in all similar ranking problems, makes the theoretical discussion of sampling very 
complicated, and if it is desired to make a precise use of significance tests a further possi- 
bility is to assume that the tied members are ranked in the order most unfavourable to 
the hypothesis under test, so as to be on the safe side. 


Conditional Tests 

21.47. When several unknown parameters are concerned, it may be difficult to find 
a sampling distribution dependent only on one of them which will form a basis for estimation 
or a test of significance. Sometimes, however, we can get rid of undesirable parameters 
by restricting the distribution in some way, and particularly by considering a distribution 
of samples which have some specified quality in common with the observed sample. Such 
distributions we shall, in Bartlett's phrase, call conditional. Fisher expresses a similar 
idea by speaking of samples which have the same configuration. ; 

The most important application of this principle is im the testing of regression 
coefficients, which we shall consider in the next chapter. Here we give a simple illustration 
of the method for the Poisson distribution. 


Example 21.12 

Suppose we have two samples from populations which are known to give the Poisson 
type of distribution but may have different parameters. We wish to determine whether 
the populations could be identical. 

Suppose the frequencies of successes in the two samples are r, and r,. If 4 is the para- 
meter of the parent (assumed the same for each), the probabilities of the samples are 


exa de and e-4 oe 
r! ra! 
and their joint probability is accordingly 
e-?^ Anna 
P{ry, %2|A} = A 4 f 5 . (21.78) 


nir! 


This depends on 4 and does not help us in answering the question. However, for the 
probability of a sample with r, + ra successes we have (since the sum of two Poisson variates 
with parameters /,, 2, is distributed in the same form with parameter 4 + 2a) :— 


ET 2a (22)n* 


hte bier wee 


128 COMMON TESTS OF SIGNIFICANCE 


and hence 


Pirit rA} — ante rite! 2 ry! rs! 


P{r,, 12 | 4} (ry + 12)! r! 1 i . (21.79) 


where r = r, + fs. 
Now in aecordance with Bayes' theorem we have 
P fryra | A} =P {r,s | rit ra} P{r, 4 074) 
and hence 
r! 


yet . (21.80) 


P(r,n|rj- 


Consequently, if we confine our attention to samples for which the total number of successes 
is r, the probability of the observed r, and r, is independent of / and is, in fact, the corre- 
sponding term in the binomial (3 + Jy. The probability is clearly that of a partition of 
r into the observed r, and r,, and if it is small we suspect the hypothesis that the samples 
emanated from the same population. 

This kind of conditional inference raises the same sort of point as we noticed in 17.44. 
We decide beforehand that, whatever r turns out to be, we will make the inference in the 


population of samples which yield that value of r. 


Pitman's Tests 

21.48. In the extreme conditional case we may consider an inference in a population 
of samples the members of which are the same as those actually observed, the population 
being given by permutations or partitions of the observed values. The tests of ranking 
and periodicity given above are cases of this kind. A similar procedure has been advocated 
by Fisher in the analysis of variance and the design of experiments, and will be considered 
in due course. We now proceed to examine tests of the same nature proposed by Pitman 
(1937a, 1938). 

Suppose we have two sets of values uw, . . . Um and v, . . . v, with means à and 6 
and the mean of the two together equal to 2. Given m + n objects, there are CG 5 dd 
ways, say N, of separating them into two sets of m and n objects, of which the given set 
is one. We call | 7— 9| the spread of the separation. Since 


mū + nd = (m +n) 2, 
we have also for the spread 


(m+n) |@—2| _ (m+n) |Z (u) — mz | 
n mn 


. (21.81) 


Take a probability 1 — « = M/N, where M is an integer. If Ris a particular separation, 
and the number of separations with spread not less than that of R is not greater than M, 
we call R discordant. If there are M or more with a greater spread we call it concordant. 
A separation which is neither concordant nor discordant is called neutral. If m =n the 
separations occur in pairs with equal spreads, and we then take M to be even. The 
discordant separations are most easily picked out as those with the largest values of 
Xu — mi|. 
| 1f the observed separation is arrived at by chance, the probability that it is discordant 
is M/N —1—« when there are no neutral separations. If such exist, the probability 
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is less than 1 — æ. Similarly the probability that a separation is concordant is 1 — c, 
or more, as the case may be. 

Two samples uy ... Um and v... v, are said to be discordant, concordant or 
neutral according as the separations u and v are so. Having selected our significance 
points dependent on «, and hence having fixed M, we can find for what values of the spreads 
a pair of samples is discordant or otherwise, and hence whether our observed pair is so. 
1f they are discordant we reject the hypothesis that they came from the same population. 


Example 21.13 (Pitman, 1937a) 
Two samples have the following values :— 


0, 11, 12, 20 
16, 19, 22, 24, 29. 


Are they significantly different ? 


There are 9 members altogether and hence 9 


5 
five and four. We take « to be as near as possible to 0-95, corresponding to a 5-per-cent. 
level of significance, and hence M — 6. We then find the groups which have the largest 
values of the spread. We have Z = 17,80 that mz = 68, and using the form | Zw — 68| 
we find those groups of four from i 


0, 11, 12, 16, 19, 20, 22, 24, 29, 


which give the maximum value to this quantity. They are— 


= 126 separations into samples of 


|Zu — 68| 
0, 11, 12, 16 2 . . c D 29 
0, 11, 12, 19 5 . 3 . : 26 
0, 11, 12, 20 . 5 ` ; : 25 
29, 24, 22, 20 . `: ‘ ` : 27 
29, 24, 22, 19 E D T : : 26 
99, 24, 20, 19 . 5 . 5 : 24 


The group 0, 11, 12, 20 gives the fifth largest spread, and so with M — 6 the observed 
separation is discordant. Our inference is that the samples come from different popula- 
tions. Only in four other cases out of 126 should we get so large a spread in samples from 
the same population. 


21.49. The extended use of the above test is barred by practical inconvenience, 
but an approximate form based on a different measure of discordance may be used. We 
now put 

RO ccu. RR NAN easy pe eee (21:82) 


where u, is the variance of the samples taken together and is thus a constant. The function 
w is hence linear in (4 — 2)?, the device of squaring, as usual, getting rid of difficulties 
associated with the use of the modulus | — Z |. AN here refers to the total sample 
n pos for the moments of à — 2 we may use the results of 11.26 (vol. I, p. 284), giving 
the moments of the mean in sampling from a finite population; for z is the population 
A.S.—VOL. II. K 
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mean. Replacing n in the formulae of that section by m and putting N — m + n, 
we have— 


E(ü—3)-o 
M REN!) 
Bs Ucet 


(N —m) HN? +N — 6m (N — m)} m + 3N (N — m — D (n — D 9 


ü — Žž 4 = - — ————_— — -n 
d m*QN — 1)(N — 2) (W — 3) 
and hence for the first two moments of w we find 
1 D 
Bw) = 4—1 . (21.83) 
3 
E (w?) = mi (1 + 6), . ñ a E . (21.84) 
where 
: 2 NI N (WN +1) \ 6 cane 
0= gor Ta em) $ nta AELA 
Ita 


Ya referring to the measure of kurtosis mc 


2 
For fixed N the modulus of the second factor in (21.85) will be found to have a maximum 
at 20-4) when m = ÀN, and it takes this value again at 
N — 2m N-—2 
NTC 2N — Y 
giving Tan = } or 5 for N = 14 and wider limits for larger N. It will also be found 
that for N > 6 the factor Na Dy 6 is not greater in absolute value than 
m (N — m) 
2(N —2). 
W if 
m 
N—m 
ie. unless one sample is more than four times as big as the other. Thus for such values 
and y, not large, 0 is small, and approximately 


Bw) = y a SFB O16) 


1 
5 « « 5, 


Similarly, using the fact that for large m and N 
B@—9* =1.3.5...Qr—-N)(1-Z) 4 
(à — 3) e —1) (1-5) 


we find approximately 
3.5 


E (w3) = : 
(0) = Wonw+)W +3) 
The moments given by (21.83), (21.86) and (21.87) are those of the B-distribution 


(1— w)yN-?w-idw, . ; . (21.88) 


. (21.87) 


1 
a Base) 
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which can therefore be used to approximate to the distribution of w. In point of fact the 
distribution seems to be remarkably close. 
w may also be written 


(i — 5) 
m+n 
w= ES ES er eed) 
E(u-— 4) --EX(v—2v)4 mop DE 
which shows that w <1. 
We also have 
m - (à — 8) 
w m+n 


Iwo Zu- +E- o . (21.90) 


and it is instructive to observe that the function on the right is the same as that of 


2 
oc ES of (21.32) with a-few changes of notation. A transformation of (21.88) to 
1 4 
“ Student’s ” form will in fact show that we can test J i T5 in the t-distribution with 
v =m +n — 2; for (21.88) then becomes 
du 
dF c uM Am LE i. (21.91) 
( mn: =) 
where “= Eel * $ z : . (21.92) 
l— w 


21.50. A test of a similar kind may be evolved for the product-moment correlation. 
Suppose we have two samples 2 . . - © and y, . + . y, and calculate 


2. cov xy 
^ 4/(var x var y) 


T 


for every possible pairing of the a's and y's, n! in number. As before, if we choose an 
a and hence a number M such that 1 — « = M /n! we may determine those pairings for 
which r is greatest and reject the hypothesis that x and y are independent in such cases 
if they fall among the M greatest. Since the denominator of r is constant, this is equivalent 
to attributing significance to the values of | Yay — nij | which exceed a given value 


determined by «. ! 
Taking # = j = 0, without loss of generality we find 


Heya Lg IQ TCI EODD 
uu 1 2 
ZU een: GOTT (ay) 
1 
e P ae eee (21.94) 
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and similarly, if yı, 72 are the modified measures of skewness and kurtosis for æ (expressed 


in terms of k-statistics, i.e. y; = E = i) and y; and y; those fcr y, it will be found that 
a 2 


eed 21.95 
E (r?) EA . : . : : : . (21.95) 


E (r*) = = (n — 2) (n — 3) 


3 
Ronee soe 8 eot i 


Thus to order n^! we have 


H 
2) 2——— 
A TI NEM S. (21.97) 
3 
AY a LÁ———— 
EO) = Gone +) 
These are the first four moments of the distribution 
1 
dF =n 1 — x2)-? dz, Sezal. . (21.98 
"ge compl eer 
Thus r may be tested in this distribution or equivalently, putting 
T 
t= —_— 5, Vin — 2) 5 1 ; . (21.99 
muc) vitet (21.99) 


in “ Student's " form with v = — 2. 

In particular, if the numbers v and y reduce to rankings, we have the test already 
introduced in 21.41. Compare also the result given for the distribution of Spearman’s 
p in 16.18 (vol. I, p. 401). 


The Combination of Tests 

21.51. It sometimes happens that we have a number of tests of significance, all 
yielding various probabilities, which we wish to express as a single probability. Suppose, 
for instance, that we conduct an experiment five times and that some test, such as that 
of the mean, gives probabilities to the observed deviations of 0-2, 0:8,,0-01, 0-1, 0-03. In 
the ordinary way two of these values would be regarded as significant and the other three 
not. What conclusion are we to draw as to the five taken together ? 

Suppose we have k values of the probability, pi - - + Pr The distribution of any 
particular p is rectangular, i.e. 


dF — dp 0<p<l. 
Hence, if « = — log p the distribution of x is 
dF = e~” dx, 0 «z «oo 


and its characteristic function is 


$() = n e-t dz 


Jesi: 


zy 


THE COMBINATION OF TESTS 133 


Hence if we write 
k " 
A-—Begp, . 07. + + — + (21-100) 
j-1 


the distribution of A has a characteristic function 
i 
t) = ——— 
$0 = e 
and is therefore given by 


1 
dE =. Atte dA. : : : 5 i 
re” eAdA (21.101) 


Putting 
M? = 24 = — 2X log p = — 2log Ip . Š . (21.102) 
we see that the distribution of M? is 

dF o M?*-! exp(— 1M?)dM . 2 A . (21.103) 


or M? is distributed as 7? with » = 2k degrees of freedom. 


Example 21.14 (K. Pearson, 1933b, quoting data from E. M. Elderton, 1933). 


Pairs of boys were selected in various age-groups and one member of each pair fed 
on raw, the other on pasteurised milk. The differences in gain in weight are shown in 
the following table, together with the standard errors of the differences based on large- 


sample theory. 


(1) (2) o» Ar (4) i A (6) 
COT ean iflerence robability 
Kod p Number in Weight Standard | of Observed x 

Ee cv of Pairs. Gained, Raw less Difference. Difference or E10 Ph» 

years) Pasteurised. i x Greater, pi. 
p 

6} 73 — 0:066 0-054 0-8888 ]-9488 

74 76 + 0:022 0-053 0-3409 1-5326 

8i 71 — 0:003 0-052 0-5239 1:7193 

9i 77 + 0-011 0-055 0:4207 1-6240 

10i 60 + 0-002 0-057 0:4840 1-6849 

3.5096 


The values of p; in column (5) are obtained by expressing the observed deviations in column 
(3) in terms of the standard error in column (4) and hence determining the probability 
from the normal integral. We have 

X logio p 


. aye T 
M 2 X log. p 2 lobe 


— 6:86 
vy — 10. 


The probability of a value of g? > 6:86 for » = 10 is about 0-74, and the test as a whole 
does not support the hypothesis of a differential effect on feeding between the two kinds 


of milk. 
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Nuisance Parameters 

21.52. From the foregoing it will have been clear that in the theories of both estima- 
tion and significance one of the main problems is to find a distribution which is independent 
of certain unknown parameters in the parent population. Parameters of this kind, neces- 
sary as they are in the specification of the parent and the precise formulation of our problem, 
can be a nuisance when we are seeking to make exact statements about some other para- 
meter on which interest is focussed. For this reason they have been named nuisance 
parameters. It may be useful if at this point we summarise the methods available for 
getting rid of them. 


(a) First of all there is the process of “ Studentisation ", whereby we can remove 
scale parameters from the sampling distribution by a suitable choice of statistic. (Cf. 
19.26.) 

(b) Secondly, we may restrict the inference to a sub-population which is conditioned 
by having certain values in common with the observed sample. It sometimes happens 
that the distribution in this sub-population does not contain the nuisance parameters, 
whereas a distribution in the full population would do so (21.47). 

(c) In the comparison of two samples, or even the testing of a single sample involving 


"an unknown mean, that parameter may be eliminated by differencing (21.27). As regards 


the case of the single sample, it is clear that if 2, . . - x, are independent and n is even, 
the values zı — £a, Xs — Va, + - + Mp—-1 — Vn will also be independent and be distributed 
with zero mean (though of course there are only 4n of them). 

(d) Transformations of the variate may sometimes either eliminate the nuisance 
parameter altogether or reduce its importance. The most noteworthy case is Fisher’s 
transformation of the correlation coefficient (14.18, vol. I, p. 345). The transformed 
function z — ¢ is distributed nearly normally with variance 1 /(n — 3), so that the difference 
of two correlations when transformed does not involve the common value of ¢. 
(Cf. Example 14.8.) 

(e) We may find distributions which are independent of the unknown parameters, 
and even of the population, by using the methods of ranking or considering partitions 
(21.41, 21.48). 

(f) The fiducial argument, in at least one known case, gives a test independent of 
unknown parameters, namely the Behrens test (20.13). 


Tt must be realised, however, that all these types of inference do not stand on equal 
footings. In particular (e) requires further examination, as we proceed to show. 


x 21.53. We may now review the many different tests which have been described in 
this chapter and consider more closely the type of reasoning on which they are based. 
We may group our tests broadly into two classes, those which give a direct test of a given 
value of a parent parameter and those which do not. à 

The first class rests on a type of inference which we have discussed fully in connection 
with the problem of estimation. There is, in fact, only a difference in viewpoint, and little 
or none in essential ideas, between estimating a parameter by assigning a range to accept- 
able values (whether by confidence intervals or fiducial intervals) and ascertaining whether 
some prior value lies in that range. The significance of parameters in large samples, the 
test of the mean in normal samples by “ Student's ? distribution, the test of a correlation 
coefficient in normal samples, and others of the same kind relating to a specified parameter 
have the same logical foundation as the theory of confidence intervals or the theory of 
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fiducial intervals, whichever is preferred. They all provide for the consideration of alternative 
values of the parameter. ^ 

21.54. The second group of tests are not, on the face of it, concerned with the value 
of a parameter in a parent population, and some of them take no account of possible alter- 
native hypotheses. Consider, for example, a test of normality or a test of randomness. 
The hypothesis is that the population is normal or the sampling is random, as the case 
may be, but this does not specify a parameter. What alternatives to normality or to 
randomness are we considering, if any ? We must have the existence of such alternatives 
in mind, however vaguely, for otherwise we should not be testing these particular 
hypotheses. But can we say what they are? And if not, do our inferences remain valid ? - 
When working with a probability « shall we still be right in a proportion « of the cases in 
the long run ? 


21.55. The kind of argument we have used in all these cases is this: on the given 
hypothesis the observed sample and all samples providing a greater value of the statistic 
being used for the test have a small probability. Therefore we reject the hypothesis. 

We may note at once that in rejecting the hypothesis we do so in favour of another 
hypothesis for which the observations are more probable. We may not express this thought 
explicitly, but it is there. The various statistics we use for testing normality, for instance 
b, can arise with greater probability from other populations which are skew or have a 
marked deviation from mesokurtosis ; the fact is assumed as self-evident (as indeed it 
is) and hence, if the statistic is improbable for the normal case there will be non-normal 
cases of greater probability. We remark, nevertheless, that the actual probability « is 
calculated on the normal hypothesis and does not hold for the non-normal cases. Thus 
we can no longer assert that we are right in proportion « of the cases. We are therefore 
relying on a less definite principle of inference to the effect that we reject a hypothesis 
which gives an improbable value to observation, provided that there exists some other 
hypothesis which gives a more probable value. 


21.56. A similar argument applies to tests of randomness. It is obvious that many 
other methods of generating a series exist which give a greater probability to a systematic 
series than the random method, and in rejecting the latter we do so more or less consciously 
in favour of the former. Our intuitive feelings on the point lead us to apply one test when 
we have the possibility of systematic order in mind (the ranking test) and another when 
we are interested in oscillations (the phase test). What we are doing, in effect, is selecting 
the test of randomness which we feel to discriminate best between the hypothesis of 
randomness and the alternative possibilities. 


21.57. Although, therefore, much remains to be done in putting tests of normality, 
randomness and goodness of fit on a formal logical basis, there do not appear to be any 
serious difficulties in doing so insofar as the specification of alternative hypotheses is con- 
cerned. But there remains the difficulty hinted at at the beginning of 21.55. In the 
majority of cases we have a probability 1 — o that the observed statistic tọ will be exceeded, 
and if this is small reject the hypothesis. But why exceeded ? Why reject the hypothesis 
because of the improbability of a number of events which have not happened ! 

Here also it seems that a closer inquiry into the logic of the process would be worth 
while. We have seen how it can be justified by confidence-interval or fiducial theory 
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when a parameter is under consideration. When no parameter is specified, the process 
must, in the present state of our knowledge, rest on more intuitive ideas. My own view 
is that, in a vague kind of way, we are really considering the range of values of a parameter 
without realising it. In selecting a statistic to carry out the test, we usually relate it to 
the sort of effect we are expecting to divert the real state of affairs from those of 
our hypothesis. For instance, if we suspect cyclical effects in a random series we base 
a test on oscillations in that series. The further the series deviates from randomness the 
greater will be the value of our statistic ; and consequently, if we could measure deviation 
from randomness (in the direction of cyclicality), we should have a parameter which could 
be located in a range in the manner of confidence intervals. Such a range would exclude 
the larger values of our statistic if it can be regarded in any sense as estimating the para- 
meter (or, more generally, as increasing with it); and hence the procedure of rejecting the 
hypothesis if the statistic is among these large values may be justified. 


21.58. It is for this reason that we began the chapter by defining tests of significance 
in relation to a parameter-value given a priori. It seems probable that in the ultimate 
analysis no other definition will be satisfactory. The fact that in this chapter we have 
given tests of hypotheses which do not appear to specify a parameter value is, I think, 
merely a reflection of the fact that the nature of those hypotheses and the inferences about 
them are not usually understood clearly but are based on more or less intuitive ideas. It 
“is probable that many of these ideas are sound and can be given explicit logical foundation ; 
but the matter awaits investigation by the statistical logician. 


21.59. There remains for consideration the type of inference used in Pitman’s tests 
(21.48 and 21.49). These are of the character of tests of randomness. Given a set of 
values, we consider all the arrangements in which they could have happened and reject 
the hypothesis if the observed arrangement is improbable. Here again, as it seems to me, 
there is a suppressed series of alternative hypotheses which would make the observed 
value more probable; and in choosing the test, such as the “spread” or the high value 
of a correlation, we are intuitively relating the magnitude of a statistic to the deviation 
from randomness. Pitman himself has shown, however, that when the hypothesis is 
definite and specifies the difference of two means, the tests give confidence intervals in the 
ordinary way (cf. Exercise 21.15.) 

We shall resume the general theory of tests of significance in Chapter 26. 


NOTES AND REFERENCES 


For the use of the ¢-distribution in non-normal cases see Geary (19365) and Bartlett 
(1935a), the latter of whom shows that, for moderate samples, departures from meso- 
kurtosis are not very serious. For approximations to t in the normal case see Hendricks 
(1936) and Hotelling and Frankel (1938). For approximations to the z-distribution see 
Cochran (19400), Cornish and Fisher (1937), and Paulson (1942). See also references to 
Chapter 23. 

For the further theory of the z?-test see Neyman and Pearson (1928, 1931a) and for 
another test of goodness of fit Neyman (1937a). The theory of 21.44 has been studied 
by a number of writers, notably by André.(1884), Kermack and McKendrick (1936, 1937), 
and Wallis. and Moore (1941). 

The amalgamation of tests given in 21.51 was apparently first given by Fisher in an 
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early edition of Statistical Methods for Research Workers amd was studied in detail by 
K. Pearson (19335) under the title of the P;-test, and by E. S. Pearson (1938). 

For a test of significance of the difference of two variances in samples from a bivariate 
normal population see Hirschfeld (1937), Finney (1938), Pitman (1939c), Morgan (1939), 
and De Lury (1938); and see Exercise 21.3. 

For the tests by Pitman, see his papers of 1937a, 1938. The similar problem in the 
testing of homogeneity in the analysis of variance has also been studied—see references 
to Chapters 23 and 24. 

For the test of difference of means when variances are unequal from the point of view 
of confidence intervals see Welch (19385) and the appendix to this paper by Miss Tanburn. 


EXERCISES 
21.1. For the population represented approximately by 
1 K MES 
F = 1 z 3 2 da 
dF Gs) { $ (3a — x je da, 


show that, if «2 is negligible, the joint probability of a sample z, . . . 2, differs from that 
if x, is zero by a term 


— P - PL (— $229) dz, . . . d. 
(27)? j j-1 
By the transformation 


j-1 


Jic ve — t) 


ya ci + ta — 22,) 


V6 
1 
Yn = ee Jdem 4s « +p) 
and the further transformation 
y; =p Sin p-s sin d, , . - . sin d, sin $o 
Yo = p SiN $, 5 sind, 4 >+- sin d, cos $o 
Ys =p Sin d, s sin d, + + + COS dx 


Yn—1 =P COS $us, 
show that the corrective term to the distribution of “ Student's " t is 


rai AT 3,5. 3n paren WIS 
af (ser rt ta) exp { (1+ ; p’ dp 
and hence obtain equation (21.11). 
(Geary, 1936.) 


21.2. By the polar transformation of the type of the previous exercise applied to 
all n variates show that if a random sample is drawn from a normal population with zero 
mean the frequency element may be written as 
L _ pt e-¥ dp ds sin d dé sin? Ja dpa» « . Sin"? duca da-ze 
(27) 
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Hence if w = ERAT where s? is the sample variance, the distribution of w is independent = | 


2 
of that of s. Hence show that for the distribution of w, writing a = Yi zn 


{T (fn + 1)}? 2 
juo I(n-c1 Xn 


1 
pa == (n pata) 
— I Cop gn aram (Mt) 
Hs ai n Lu a H xi 


Hre A {3n™ + (8a? + 3) n9 + 632 n9) + at no) [11 
Hence show that e n = 50, vb. = — 0-24 and fj, = 3-10, indicating fairly rapid tendency 


to normality. 
(Geary, 1935a). 


21.3. Show that in samples from a normal bivariate population c 
H a* — 2pry 
da: d 
as a xp | Ale faa o H ne 
the functions welt Wy e. 


1 
Oo; 0z Gi Oz 
are distributed independently and that their correlation coefficient R may be written 
= a —a 
V (a +a)? — 4aar? Y 
moi PACET 
p tw Sy a 
and r is the correlation between the observed z's and y's. Hence show that 
—Rv(n — 2) _ (a —a)v(n — 2) À 
Vü-—mR) (4 —r3)ax) Y 1 
is distributed as “ Student/s " ¢ with n — 2 degrees of freedom. Show how to test the ` | 
ratio « from this result. 


(Pitman, 1939c. The test has the remarkable property of being independent of the 
parent correlation p.) 


21.4. If an even number n of members of a sample come from a population with 
mean u, show how to find a sample of half the size distributed with twice the variance 
about zero mean. Hence show how to extend the result of Exercise 21.2 to the case where 
the population mean is not zero. 


21.5. Ea parameter admits of a sufficient estimator, show that a test of its significance 
can be derived direct from the likelihood function. 


21.6. Derive equations (21.47) and (21.48). 
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21.7. i Let lhin las- h n-i be (n — 1) linear functions of the observations which 
are orthogonal to one another and to %,, and let them have zero mean and variance oj. 
Similarly define lj; . . . li, s 


Then, in two samples of n from normal populations with equal means and variances 
o? and o$, the function 
Vn (E, — ds) 
{ (hy + lj)?/( — 1)* 
will be distributed as “ Student’s " t with n — 1 degrees of freedom. 
(Bartlett, 1937c, and Welch, 1938b. The test does not depend on the ratio o3/o3 and 
can be extended to the case of unequal sample numbers, but only at the expense of losing 


efficiency in the sense that the degrees of freedom number one less than the lower of the sample 
numbers.) ^ 


21.8. Given two samples of »,, n, members from normal populations with unequal 
variances, show that by picking », members at random from the n, (where na >) and 
pairing them at random with the members of the first sample, a test of significance of 
difference of means can be based on “ Student's " distribution independently of the vari- 
ance ratio in the populations. (This test, again, is exact, but sacrifices the information of 
n, — n, members of the second sample.) 


21.9. If z is the ratio of the sample mean to sample standard deviation in normal 
samples, and 7 is large enough for the distribution of the variance to be regarded as normal, 


show that 


z " t 
e, V (20) Verran ™ vm) Verai} 
is distributed approximately normally with zero mean and unit variance, where 
n 
E r(3) ) 77 
C EET RET ~1— "x 


\ 4n 32m?" 


re) 


21.10. If x, y have a continuous frequency function f (x,y), their characteristic 
function is 


(Hendricks, 1936.) 


g lu 0) = f [exp Gur + soy) f (2, y) de dy. 
Show that the distribution of z when y is given has a characteristic function 


emin d (u, v) dv 


E 
z 


ev" ¢ (0, v) dv 


-0 


$ (uly) = j 
(Bartlett, 1938b.) 


21.11. Ifa set of parameters 0, .. - 0, admit of a set of sufficient estimators, show 
that conditional inferences independent of 0, . . . 6, are possible, the conditions being 
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that the estimators are constant for the samples concerned. Conversely, if conditional 
inference is possible, the irrelevant parameters must admit a set of sufficient estimators. 
(Bartlett, 1937c.) 


21.12. In a normal sample of n values show that if 
EU Uy — Va 


~ (2n) 


n 
and ns’? = Ea? — nz’? = 4 (a, + 22)? + J 2, 
j-8 


where z,, 7, are two sample values taken at random, then 
= 


v 
E 


DET 


is distributed in the same form as “ Student's" ratio z = 
zero. Show further that 


LAETI 


, when the parent mean is 


|z| <1. 
(Neyman, Lectures and Conferences on Mathematical Statistics, 1938. The example shows 
that if z is “ significantly " large, ¢ must be small and hence the two criteria based on z and ¢ 
lead to opposite conclusions.) 


21.13. In a 2 x 2 contingency table, show that the border relative frequencies 
are, on the hypothesis of independence, sufficient estimators for the probability of success 
of the two attributes defining the table. Hence derive the exact test of significance in 
such a table as a conditional inference. (The exact test is given in 12.16, vol. I, p. 303.) 

: (Bartlett, 1937c.) 


21.14. If two samples are drawn from a bivariate normal population, v;; and vj 
are their covariances, V,, and V,, are the variances of the pooled samples, and V, its 
covariance, show that the distribution function 

F (vis, val Vis Vis, Voz) 
is independent of the parent variances and correlation. Hence that the distribution 
would provide a test of the difference of sample covariances. 
(Bartlett, 1937c.) 


21.15. If two samplesz, . . . z, and y, . . . y, are drawn from populations which 
differ only in location and the difference in means is d, show by considering the values 
typified by z + d and y how to set confidence limits to d, based on the distribution of 
w of equation (21.82). 

(Pitman, 1937a.) 


21.16. In the previous exercise show that the confidence limits for d are the same 
as those based on ** Student's ” distribution in the case of normal populations with different 
means and identical variances (equation (21.32)). Explain why the latter test is only 
valid for normal populations, whereas the former is valid for any population. 


CHAPTER 22 


REGRESSION 


The Analytical Theory of Regression 

22.1. When considering the theory of correlation in Chapters 14 and 15 we introduced 
the concept of linear regression of one variate on a set of “ independent" variates. We 
shall now study this subject more fully and extend the theory to the case where the regres- 
sion lines are not straight. In the first instance we confine our attention to bivariate 
populations, but the majority of our results are easily generalised to the multivariate case. 

In speaking of one variate as “ dependent » and the others as “independent ” we 
introduce what may be a source of confusion. In general, all the variates are dependent 
in the statistical sense, each on the others, and in special cases may even be functionally 
dependent. In selecting one for separate consideration and in discussing its dependence 
on the others we are usually attempting to solve a problem in estimation : for given values 
of the other variates, what is the best estimator of the “ dependent " variate, or its central 
value in the distribution which it has for such given values? The idea of “ given " values, 
that is to say values which can be selected at will, leads to our referring to them as “ inde- 
pendent ”, though they may be statistically dependent on one another. It might perhaps 
be better to use different words, but the practice is so common that we make no attempt 
to improve it. Once the point has been understood no difficulty arises in practice, 


22.2. If we have two variates v, y with frequency function f (x, y), then for any 
fixed value of y the mean of æ, say T, is given by 


&-[. xf (v, y) de | |" 10, ae. - T . (22.1) 


The expression on the right is a function of y and thus the points whose co-ordinates 
are (Z,, y) have a locus which is, in general, a smooth curve. This curve is defined as the 


line of regression of x on y, and may be written 


ie af (x, Y) da 


y= ATE ap PNG a, (DAP 
f f (a, Y) dex 


where X, Y are the current co-ordinates. Similarly there will be a line of regression of 
f on x given by 


ii y f (X, y) dy 
yor 


z : « (22:8) 
IF fF OG y) dy 


We shall take Y to represent the dependent variate throughout this chapter. 
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22.8. We may also consider the more general curves typified by 


f rran 
T=- — —, 
[rena 


the regression now being of the rth moment of y on x. If r = 1 we have the regression 
of the first moment, or simply the regression. If r = 2 and y is measured from the mean 
we have the so-called scedastic curve of y on c, 


f (y — 7)? f (X, y) dy 
=" ——— A ^ . (22.5) 


[roo nu i 


. (22.4) 


Y= 


which shows how the variance of y varies with z. Other forms which have been studied 
are the clitic curve 

[was cindy 

Y m e 

[reno 


and the kurtic curve 
[m carr oon aw 


iS f Q9) dy 


Y= 


These curves correspond to the moments of a univariate distribution, and the main 
characteristics of a bivariate form may be studied with their aid in much the same way 
as the lower moments can be used to summarise the properties of a univariate form. 


22.4. It is interesting to remark that, just as we can find the moments direct from 
the characteristic function, so also we may ascertain the regressions of moments from 
the bivariate characteristic function, even when the distribution function itself is not 


explicitly given. 
Let us write the frequency function in the form 
fe»-segeG + + + +3 (22.8) 


where g (x) is the total frequency for any given z and g, (y) is the frequency of y for any 
given z. In the notation of the theory of probability we should write this 


fey) =g (x) 9 (y | 2). 
The characteristic function of z and y is then 


9 (5,6) = Is be exp (it; x + ite y) g (x) 9. (y) d dy 
i ei^? g (x) dy (te) da aol i et pape CO) 


where $z (ts) =e ev g, (y) dy MIU x (2210) 


and is the c.f. of y for a given c. 
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Tf the rth moment of y about the origin for a given v is Hiz we have 
ite [t9], 
and hence, from (22.9), 
[5s $ (ti 21 LU Id ag Gh Wide e a OT 


at; 
Thus, by the Inversion Theorem, 
g (x) p, N me | O dt te (22:12) 
jm Pres, EU Ard Cep i 


subject, of course, to conditions of existence. This gives us the required expression for 
Wz in terms of z, and the regression can be written down at once. 


22.5. Since 
Y 5 (its) (Gt) 
Dn EN. PC ji El j 


o 


[s]... =i exp PC en | Bi. S ay 


we have 


j=0 7=0 
UOI: v (its)? 
Sigt rg © te Due ETAR (99318) 


and ¢ (t 0) may be written ¢ (tı), being the characteristic function of g(x). We also 
have, subject to existence conditions, 


Di ET in c [ Hecho (h)d&. . -> (2214) 
Hence, from (22.12), (22.13) and (22.14) we find 
Sey ae leo 
sso Si etm [att], de 


ifa i s it, Ý 
-foesau 


= Ec» - NDA arity Gea 22318) © 


7=0 
provided that the interchange of summation and integration in the last step is legitimate. 
Thus we have, for the regression of the mean, 
y= fen CO ue Ne OHS) 
994 g(s) lex 
This notable result is due to Wicksell (19345). The expansion is valid if the cumulants 
exist and if g (x) and its derivatives are continuous in the range and zero at its extremes ; 
for then the interchange of summation and integration in arriving at (22.15) is legitimate. 
In particular, if g (x) is normal ‘and in standard measure we have 


y -r7 B; (2), eg CE ey pace mee Prou 


where H; (x) is the Tchebycheff-Hermite polynomial of order j (6.20, vol. I, p. 145). 
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Example 22.1 
For the bivariate normal distribution about the mean we have 
= 1 z? 2pay £) d 
dF = k exp { apla an A x dy, 
$ (to ts) = exp (— } (of ti + 20010: ta + o3 t3) )- 
Hence 


p] 2,2 
[3] = — pa a, t, exp (— doi ti), 
ats 1-0 
and from (22.12) 
g (2) His = zl po, oat, exp (— dot — it, 2) dt, 


pos MA 
= qe. 
ai v (22) 
Hence 
o 
Miz = E 
and Y=, 
Oy 


the familiar relation of linearity for the regression of the mean of the normal distribution. 
Alternatively, direct from (22.17) we have, since xj = 0, j> 1 


REDI UE) 

[^ [^ 

ye X, as before. 
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Example 22.2 (Wicksell, 19345) 


Consider the frequency distribution of & = 3X (x°) and y = 32 (y?) where v, y are 
samples of » from the bivariate normal population 
— dU Ry ax 2 
dF œ exp 8 —55 {x — 2pay + y?) dx dy. 
The characteristic function is 
n 
UT 


where 0, = it, and 6, = tty. 
The distribution function cannot be expressed in a simple form, but we may determine 
the regressions without it. We have 
ad EVE {1 — (1 — p?) 6; Y. 
[s]. n) at 
Thus, from (22.12) e 


n [r] 2 
0 ($771) e-*5 {1 — (1 — p?) 0,}° 
2 di 


Gale 
9 (È) is = ge fo a —6,)P4r 0,. 
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The integrals may be evaluated by successive application of 


Loi eae Tur 
sauen rpa t 


and we find, for the regression of 7 on &, 


cg *r(t-$) 
Hog = Mos — (uio? 
=a- {za -o ua 
Thus the regressions of both mean and variance of 7 on ¢ are linear. 


Fitting of Curvilinear Regression Lines 

22.6. From the practical point of view the case we have just considered, namely, 
the one where the distribution or characteristic function is given, is exceptional. The 
determination of regression curves has, in the majority of cases, to be carried out from 
numerically specified material, which we shall consider in the remainder of the chapter. 
We shall confine our attention to the regression of the mean. 

In general the means of arrays will not lie exactly on a smooth curve (unless of course 
we choose a curve of order equal to the number of points to be fitted, less one). Nor do 
we know a priori what is the appropriate degree of a polynomial which will approx- 
imately represent the regression line. Let us, however, assume that the regression can 
be represented by a polynomial of order p: 

K=a+a,X+a,X?+... +44”. . . + (22.18) 
We will consider later how the appropriate value of p is to be determined in particular 
cases. Our problem is to determine the coefficients a from the data. As usual, we appeal 
to the principle of least squares, that is to say, we find the values of the a's which will 
minimise 

U-EX(y—a—az—...—aj2y, 5 E . (22.19) 
the summation extending over the sample values. 

Differentiating with respect to aj we have 

E (ay) —a) Za) —a Za —... —a, Zaf*? — 0, 
and similar equations for j = 0, . . . p. Writing the moments without primes for sim- 
plicity and letting p; represent the jth moment of x, and uj the bivariate moment 
X (xy), we have 
Goo "ER Ha + es cb Rp Hp — Hn 
doa dr daa + oe ee E Uy Hp = Hn x * . (22.20) 


Qo Hy +i Hpki ri + by Hop = Hp 


Writing now 
Ko Mı ruso Ry 
A® = Hı da etn, S Sy gc . (22.21) 


Hp Poti +++ Pap 
A.S,—VOL. II. 
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and Af? for the determinant obtained by substituting the product-moments Jo - - > Hp 
for the (j + 1)th column, we have, as the solution of (22.20), 
_ Ap 
y= Toy pps ge: (22:22) 
22.7. It might appear that this solution could break down if A” — 0. Such a 
thing is not possible, however, except in the most trivial case. In fact, if the distribution 
function of the zs is G(x), we have for A” 


| 1 &% C RECTE 7 AED 
Ao) =| f f m oxp oc N G a dO, 
(i ee, sear i 
or, if Date th | 
Docs Yo PE | 
| Y @ ... 8 


ao -[] [tss . - -28 D dG, ac, . . . dG, 


Tf we now permute the suffixes of the a's in all possible ways and sum the (p + 1)! resultants 
we obtain, in virtue of the definition of a determinant, 


(p +1)! 49 -[j f prac, Gi oe e qo Ae nS eas (89.99) 
and hence A’) is essentially positive. 


22.8. From (22.18) and (22.22) we see that the regression line may be written 


y elas Be AS 
Hor fo Pa ..- Bp 


| MEA ME d E MEI 
| BER 
| 


Upi Hp HMpyi +++ Hap 
This is a formal solution of our problem. The moments y can be obtained from observation, 
and equation (22.24) then gives the regression line. 
Tt will be observed that in order to preserve the symmetry we have written po for 


the total frequency unity. 


22.9. A somewhat different approach leads to the same solution. If we assume 
that the regression line is a parabolic curve of order p, we may find the coefficients by the 
principle of moments. This would lead us to identify the lower moments 

Z(aiy) =E (aq +a t+... + apa) 
as far as was necessary to determine the a’s. This clearly leads back to equation (22.20). 


Orthogonal Polynomials 
22.10. The use of equation (22.24) in practice is subject to one serious drawback. 
Tf we have a set of data and no guide, apart from inspection, to the appropriate value of 
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p, the only course is to fit curves of order 1, 2, 3, . . . and so forth, until we reach the point 
when further terms do not improve the fit. Every time we add a new term the determin- 
antal arithmetic has to be done afresh. To obviate this nuisance we shall consider the 
regression line in the form 


Y=5P.+bPit...+5,P,, - > 3 . (22.25) 
where the P's are polynomials in X, P; being of degree j. We shall determine the P's 


so that 
Z (Pj; P4) = 0, jzk : s A . (22.20) 


the summation extending over the observed values. 
In minimising 
EZ(y—5,P,—b,P,... — bp P,)%, 
we shall have equations such as 
E(yP) — bo Z (Po P;) —. . . — bp Z (P, Pj) = 0, 

and in virtue of the orthogonal relations (22.26), this reduces to 

E (yP;) — b Z (P5) = 0. 5 . * . (22.27) 
Thus 6, is determined simply by P}; and if, having fitted a curve of order p, we wish to 


go a step farther and add a term b,,.; Pp +1, the coefficients b, . . . b, found from (22.27) 
remain unaltered. 


22.11. Furthermore, the use of these orthogonal polynomials will give us a very 
convenient method of determining step by step the goodness of fit of the regression line. 
We have 


U = X (y — b Pi —. . . — bp Pp)? . 
= E (y?) — 2b) Z(yB,)—...—25,Z(yP,) + 02 Z (P9) +... +032 (PR). 
But from (22.27) we may express X (yP;) in terms of £ (P5), and we thus find 
U =S (y?) — b} E (P?) —. . . — b3 (PZ). - 5 . (22.28) 


Thus the effect of any term b; P; is to reduce U by b} X (Pj) and we may examine the effect 
of this term on U separately. If we find that the addition of any term b, P, does not 
reduce U significantly, we may conclude that it is redundant (so far as concerns the 
representation of a regression line by a polynomial). 


22.12. We proceed then to derive expressions for the orthogonal polynomials in the 
general case. Later we shall examine the important special case when the values of v 
are equidistant (as, for instance, with grouped data and most time-series). 


Put 
p 


Poe) dex. POR 21. 0199/25) 
Tn this expression there are (p + 1) unknown constants c, and hence in all the polynomials 
up to and including those of the pth order there are $ (p + 1) (p + 2) constants. The 
orthogonal relations up to and including order p will then provide 1p (p + 1) conditions 
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on the c's, so that p + 1 constants are assignable at will. We will take one for each P and 
assign it so that the coefficient of X? in P; is unity: 

ej — 1. 3 . (22.30) 
In particular c, = P, = 1. The orthogonal relations are then just sufficient to determine 
the other c's. For instance, for the set cp j —0 ... p —l they are 


ZPQPQ-EP,-0 


ZP,P; =0 
and so on. This system is clearly equivalent to the p equations 
ZP, =0 
Pe ESO Mq M E T e e. (22.81) 


Xu: p m0 
On substituting for the P's from (22.29) we get 


Cpo Ho + Cp Hà +.» -+ Cpp-1lp-1 + Hp =0 
Coli cb Op +. - -+ Cpp-iëp +p =0 


Cpo Hp-1 + Cp Hp +- + - + Cp pi Hop-2 + Hap-1 = 9- 
The solution may be expressed as a determinant in the usual way. Writing A®-" in accord- 
ance with (22.21) and 4!) for the minor of the term in the last row and (j + 1)th column 
in (22.21), we find 
| Ag) 
os = asa eee TCD) 
This expresses the c’s in terms of the ascertainable constants p. It follows that 


Ho Hy +++ bp 
1 Hi fis 05 Hp 


P, = 35-5 . (22.33) 


Bp-i Pp +++ Hsp-i 
1 Par ed EP Ga 


We notice in particular that, in virtue of the diagonal symmetry of 4, we have 


Cig = Chje . (22.34) 
22.13. In virtue of (22.31) we have 
2 (P3) = 2 (x? P,) 
and thus, from (22.33) on multiplying the last row and summing, 
nA) 
Z (Pi) = Jom EE ES, (22.80) 
Similarly : 
nA 
Z (y Pp) = —— ee CEES) 


Ae-u* 
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Finally, from (22.27) 
bp = 4P 


ry . (22.37) 


Our problem is now solved. We have expressed all the unknowns in terms of 
caleulable determinants. 

We may note in passing that since the regression equation must remain covariant 
under a change of origin, all the coefficients b except b, are seminvariant, and the origin 
can thus be chosen at will. b, itself is the mean of the y-values. 


22.14. Explicitly for the polynomials we have (taking u, = 0, u, = 1)— 


X ey Ra CRS SARUM VER E 
Li x | 
A 2 D T MEL LEE IM 
n don 
01 gu | 
2 
iB S -X*-aX-1 . =. 0. (0240) 
| 01 
140. aloes 
Ol ws Ms 
l du. oH. ds 
1Xx x? Xs 
Ki 150/230 
Ox T 
l ps Ma 


aw t uei E EY 


He — pg — 1 
+ (Hs ps — på + pa — 15) X + Qs — Yrs + us)). + (2241) 


and so on. In particular, if the population is normal, 


P, = X? — 3X, ete., 


the polynomials in this case reducing to the Tchebycheff-Hermite functions (6.20) which 
we know to form an orthogonal system in the normal case. 


Example 22.3. Ungrouped Data 
Table 22.1 shows the relationship between the percentage loss in weight (Y) and the 
temperature (X) in a number of samples of soil. We require to find the regression of Y on X. 
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TABLE 22.1 


Fitting of Curvilinear Regression for Ungrouped Data 
(Data from J. R. H. Coutts, J. Agr. Sci., 20, 541.) 


Percentage Loss Temperature 
in Weight. (degrees F.). 
Y x 
3-71 100 
3:81 105 
3-86 110 
3-93 115 
3-96 121 
4-20 132 
4-34 144 
451 153 
4:73 163 
5:35 179 
5774 191 
6-14 203 
6:51 212 
6-98 226 
744 237 
7-76 251 


For the sums required we find— 
n = 16, X (y) = 82-97, X (y?) = 459-4368 ; 
E (x) = 2642, X (æ?) = 474,050, X (x?) = 91,244,582 ; 
J (x4) = 18,553,164,842, X (25) = 3,930,294,225,302 ; 
E (29) = 858,077,668,755,250 ; Æ (yx) = 14,736-19 ; 
E (yx?) = 2,819,909-45, X (yx?) = 571,902,362-11. 
These can be run off fairly quickly on a machine. We have not bothered to take a different 
mean from those given, but in general a certain amount of arithmetic can be saved by 
so doing. 
Considering first of all the straightforward approach of (22.24), we have for the straight 
line of closest fit, 


Y 1 X 
82-97 16 2642 = 0, 
14,736-19 2642 474,050 
reducing to 
X 
Y = 0-660 + 2-741 | — J. . ` . . % 
d (5) (22.42) 


We have put nu; instead of ji; in the second and third rows of the determinant, as we are 


clearly entitled to do. 
Similarly we find for the second- and third-order parabolas— 


X XM 
Y = 3:551 — 0-920 (55) +1 en (s) . ç L : . (22.43) 


X XM XM 
Y = 7-783 — 8940 ( ~ ) —5-875(  . — o Pi 
(5) 5815 ( 5) oss (2) ` . (22.44) 
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Fig. 22.1 shows the straight line and cubic fitted to the data by these means. An examina- 
tion of the coefficients in the equations illustrates the point made above, that as successive 
terms are added to the polynomials the coefficients of all terms may alter very considerably. 


9 


tn 


Percentage loss in. weight 
A 


100 120 140 160 180 200 220 240 260 
Temperature (degrees) 
Fig. 22.1.—8traight Line and Cubic Parabola of Closest Fit to the Data of Table 22.1. 


Consider now the alternative approach by the use of orthogonal polynomials. By 
the use of equations (22.33) we have 


16 2042 
Unies 1 X / d 
= X — 165-125. 
16 2642 474,050 ye 
P= i EU 91, aes 582 Jl ms Ne dS 


= X*—343137X + 27,032-435. 


16 2642 474,050 91,944,582 
2642 474,050 91,244,582 18,553,164,842 
474,050 91,244,582  18,553,164,842 — 3,930,294,225,302 
1 2t x? x? 
us 16 2642 474,050 
2642 474,050 91,244,582 
474,050 91,244,582  18,553,104,842 


= X? — 522:940X? + 87,182-434X — 4,605,047. 
The b-coefficients are given by (22.37), the determinants in the numerator having been 
already tabulated in finding the P's." We have 
27409 , _ 10695 , _ _ 091889 


by ex BABE Boe iege Togs tS) 100 
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these being the values already found in arriving at (22.42) to (22.44). Thus V 
2-7409 1:0695 |... 
z5 SUL gue LAR — 343: 2-4 
Y = 51856 + 100 (X — 165-125) + 1002 (X 343-137X + 27,032-4) 
— e (X? — 522-040X? + 87,182-4X — 4,605,047). . . (22.45) 
Tf we stop at the second term we have 
Y = 5:1856 + ad (X — 165-125) 
100 
Š X 
= o0 + 2741 (= ), | 
which is the same as (22.42), as of course it must be. Similarly, if we stop at the third or | $ 
fourth terms we find equations (22.43) or (22.44). 
Now consider the fit of the regression line. We have from (22.35), 
A) 
b (Po) =n Br 35-3 =b, J (YP,). 
The determinants in this expression have already been evaluated in finding the regression A 


line. Remembering that X (y?) = 459-436 we obtain the following :— 


A0 
j. bj. | nb? Jo- U (equation (22.28) ). 
0 a 5:1856 430-247 29-189 
n 2:7409 x 10-* 28-390 0-799 
2 1-0695 x 10-* 0-669 0-130 
3 — 0-91889 x 10-* 0-080 0:050 


Tn calculations of this kind it is as well to take b; to an extra place of decimals, as the value 
of U is rather sensitive to small errors of rounding up. Even so, the last figure in U is 
unreliable. 

From the values of U it is clear that the fit is greatly improved by taking a quadratic 
term, and still further improved by adding the cubic term. How far a quartic term would 
improve matters cannot be decided without ascertaining the term. We have, however, 
not proceeded beyond the third degree because to do so would require moments of the 
eighth order. For a small population such as this, which in practical applications would 
be considered as a sample only, the errors in higher moments would probably be considerable. 

The reader who works through the arithmetic of this example will find that there is 
about the same labour involved in either method. It is in the fitting of higher order terms 
that the method of orthogonal polynomials shows its superiority. In practical cases it 
is preferable to avoid the large numbers arising from the evaluation of determinants by 
a modifieation of the procedure given in 22.27 below. 


Example 22.4. Grouped Data 

In Example 14.1 (vol. I, p. 331) we considered the correlation between age and highest 
audible pitch in 3379 subjects and found the linear regressions. Let us take the work 
a stage further. 


Ed 
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For the data of the table (X = age, Y = pitch) we find— 
2 (y) = —708; Z (y?) = 8894; LS (yx) = — 12,535; 
(æ) = 2604; 2 (x?) = 47,392; 2X (v?) = 387,498; 
X (wt) = 4,842,172; X (25) = 62,401,794; X (a5) = 883,576,012. 

As a variation on the procedure of the previous example, we will convert these figures 
to moments about the mean (with Sheppard’s corrections) and put them in standard measure. 
We find— 

Hor = — 0:209,529; por = 2:504,904 ; 
fy = 0°770,642 ; uy = 19-348,229. 
In standard measure the other moments are 
us = 1-705,978; jg = 6:295,759 ; 
us = 20°729,861; ue = 78-409,775. 
We may now use equations (22.38), etc., direct, and find 
P, =1, P, = X, P, = X? —1-705X — 1, P, = X? — 3-471 X4 — 0:376X + 3:560. 


We now require the moments ji; and jt. We find 


Z (yx?) = — 112,495 
X (yx?) = — 1,399,039, 
and hence, with Sheppard's corrections and in standard measure, 
un = — 1177920 us = — 4:215,958. 
We now find, from (22.37), 
b =0 
b, = — 0:613,626 
b, — — 0:055,064 
b, — 0-010,205. 


The regression line of the third degree is then 
Y = — 0-6136X — 0-0551 (X? — 1-705X — 1) + 0:0102 (X? — 3-471X? — 0-376X + 3-560), 


where the origin is at the mean and the units are in standard measure. 


Standard Errors of Regression Coefficients 

22.15. The standard errors of unknowns derived from least squares can be found 
by the use of a result due originally to Gauss. Suppose «; is the true value of a; and the 
residuals y — Exi are distributed normally with variance v. Writing da; = «j — a, 
we have for the frequency function of the residuals— 


hse Rien 
J ceap- g (y —Z a2) 


ce exp — f(y 7 Zu2)* + EE (dy) 


154 REGRESSION 
iz denoting summation over the sample and X over the values a, to ap, and the cross- 
term vanishing because the a’s are minimal values) ; 

cc constant x exp — gil (da; xi)? 


oc exp — ds Z (da; day aj**) 
20s i,k 


o exp — PA day, ljr) - : 5 i; . (22.46) 


In the limit, then, the deviations are distributed in the bivariate normal form, and from 
the results of 15.12 (vol. I, p. 376) it follows that 


vara; — 5 _, & : x . . (22.47) 
n 


for the determinant whose terms are xj,» is in fact the determinant we have already defined 
as 4”), and AY is the minor of the item in the jth row and column. 

Now v is the variance of deviations from the theoretical regression line, and in terms 
of variations about the observed line we have, remembering the result of 18.17— 
Ap vare 


vara; = Vb" pope . (22.48) 
Since the correlation ratio of y on x is given by 
var e = var y (1 — n°), 
we have also 
vara; — 4p OS eg + (22.49) 


A? n—p-—1 


For large samples the replacement of n by n — p — 1 in the denominator is an unnecessary 
refinement. 


22.16. For the case of orthogonal polynomials the results apply with a slight but 
important simplification. The coefficient 6, is the same as a; if polynomials up to order j 
only are fitted, and hence, since 4j = 49») we have 
46-9 (1 — n?) var y 

a^ sg 1 
The same result follows by modifying (22.46), which for orthogonal polynomials becomes 


H 
f « exp — ae 1227 (259, : ; 5 . (22.51) 


var b, = . (22.50) 


showing that the b’s are independently and normally distributed with variance 
v 
var b; — EPP 
reducing to (22.50) in virtue of (22.35). 


22.17. If the parent population is normal, 7 = p, and the determinants 4? can be 
evaluated explicitly in terms of the variance of x. In fact, 
AU- H 


AD “Fifa; c= soe ~ (22.52) 
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and hence 
ae 1 (1 — p?) var y 
var b; soya = Traces YT. A : : . (22.53) 
or, in standard measure, 
1 1 —p? 
varb; = —— —. , —_"., 1 e F . (22.54) 


n—j—-1' j! 
Equation (22.52) can be found by evaluating the determinants in the ordinary way, but 
i) 
it follows more simply from the consideration that =n is equal to : X P}, which, in the 
normal case, is for large samples equal to E (P7) =j! (var x)’ (6.22. vol. I, p. 147, with 
a change of scale). 


22.18. The advantages of using orthogonal polynomials instead of powers of X 
are apparent in the forms taken by the standard errors of the coefficients a and b. The 
latter are independent of the order of the polynomial fitted and can be tested once and for 
all. The former do not possess this advantage. It seems preferable, therefore, as a matter 
of technique, to work with orthogonal polynomials througheut, whenever regressions of 
order higher than the first are likely to require investigation. 


Example 22.5 

Consider again the data of Example 22.4 (regression of highest audible pitch on age). 
We have there expressed the regression line in standard measure and in the orthogonal 
form, and may therefore use equation (22.50) in the form 


1—3*49 
var b, = NEG 
1— 240 
gnare. a 
1— 140 
var b, = r aes 


(The sample number n is so large that we can ignore the element — (j + 1) in the divisor.) 
The determinants required are already known, having been ascertained in the course of 


the work. We have 
Ao AW 


A 
= 0:4189, — = 0:0985. 


am = + 35 A5 
We also require 7, which was found in Example 14.11 (vol. I, p. 352) to be ny, = 0:6231. - 
Thus 1 — y? = 0:6117. We find 
1:8104 0:7584 i 0:1783 
Si LOSES 


Var bs = aa Vers a 


The values of the 6’s and their standard errors are then 
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Tn all cases we should judge the coefficients significant, as being more than twice the standard 
error. Although, therefore, the second- and third-order terms are small and the regression 
is approximately linear, the deviation from linearity is not merely a chance effect. 


Exact Significance Tests in the Normal Case 

22.19. When the parent population is normal, more exact tests than those derived 
from the use of standard errors may be obtained. We have already seen (14.21, vol. I, 
p. 348) that a function dependent only on sample values and the first regression coefficient 
b, was distributed in "Student's" form. We proceed to generalise this result. 

Consider in the first place the linear regression equation 

Y = ğ +b, (X —2), 5 P à . (22.58 

and let f, be the population value of b, and oj the variance of y in the population. Since 
the parent is normal, the variance of y for any fixed value of æ is oj. 

Our estimate of b, is 


_ Ly (z —#) 99 5 
3 b = Pena’ LUN. (22.50) 
where summation takes place over the sample values. Thus for fixed values of x we have— 
S (a — 7)? vary 
(Ze-2)* 
eens Os 
~ X(r—z) 
Thus, since the mean of the distribution of b, is f,, we see that, for samples having the 
same zs as those observed, b, is normally distributed about mean f, with variance given 
by (22.57)—normally because it is a linear function of the y’s which are themselves normal. 
Consequently, 


var b, = 


2035. (99:57) 


Q$i—8)vE(r—2)5 — " z b . (22.58) 
Oz 
is distributed normally about zero mean with unit variance. 
If c, were known this would provide a test of significance of b, in the ordinary way ; 
but in fact ø, is not known and the substitution of an estimate distributed in the Type III 
form brings in the ¢-distribution in the usual way. We take as our estimator of c, the 
function s, where 


8 = — Z(y — IY), . : : . (22.59) 


amd Y’ represents the values “ predicted ” by the regression line, that is, the values 
Y'ag—lb(r—i. . : z 5 . (22.60) 
Thus s? is based on the sum of squares of residuals. We shall show presently that s? is 
distributed in the Type III form with » — 2 degrees of freedom independently of b; — bi 
It follows that 
p= = B) VZ — 2) v (n —2) 
yzy- YF 

is distributed as “ Student's" ¢ with » = n — 2. 

A given value f, may be tested accordingly. But we notice that the inference is a 
conditional one, that is to say, we are considering the distribution of ¢ in a sub-population 
for which the z's are the same as those actually observed. (Cf. 21.47.) 
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22.20. To establish the foregoing result we have to show that X (y — Y’)*, the sum 
of squares of residuals about the observed regression line, is distributed in the Type III 
form with » — 2 degrees of freedom. This is a particular case of a general theorem we 
shall prove at the beginning of the next chapter, but we will sketch an ad hoc proof here 
for the sake of completeness. 

Since the population is normal, the deviations of y from the true regression line for 
fixed z's, Y = fo + bı (X — &), where f, is the parent mean of y, is normal with variance 
ej. Now 

8? 
(n — 2) 


es 


-lzy-rYyp2zlz(y-b&-5(-2p 
o; o; 


1 A = 
-327U — Bo — Bi (x — €) — (bo — Bo) — (bı — Bx) (x — 2))*. 
2 
The coefficients b, and b, were chosen so as to minimise this sum, and hence 


e-32-32ü -h-hG 21-30 B — 9 — AY za. (22.61) 


2 
The first term is the sum of squares of » normal variates with zero mean and unit variance ; 
the second is also such a variate, for it is the square of the deviation of the mean of y about 
its true value divided by the variance o2/n ; and the third term is also such a variate, as 
shown above. 
Tt does not follow immediately that ^ See is distributed as the sum of squares of 
o3 


n — 2 normal variates in standard measure, for the constituent items might be correlated. 
Let us then find an orthogonal transformation to new variates £, . . . £, linearly related 
to the n normal variates y — f, — fi (x — ë). These also will be normally and inde- 
pendently distributed. In particular (remembering that our summations refer to the 
y’s and z's, but the a a are constant for our distributions), take 


& = Z(y — Bo — Bi (2 — 2)} 


Sar 
= (b. — Bo) 


1 z—ī » 
"ur Bo p.(e 2) }| 


1 E. 
s (bı — Bi) VZ (x — 2). 
2 
é, and £, are then normal variates in standard measure. Moreover they are orthogonal since 
Phi gya ym 


=k £ (x —&) 
=0; 


n 
Consequently our transformation exhibits the first term on the right in (22.61) as 2$ 4j and 
171 


the second and third as £? and é}. Thus the total is distributed as |” £j, which is the 
j=3 


result required, 
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We may compare the result of 18.17—in which we saw that the mean value of e? 
was n, whereas that of e? was n — p — 1, one degree of freedom having been lost in the 
sum of squares of residuals for every constant estimated—and the approximate result of 
21.20 in which 7? had to lose a degree for each constant fitted by maximum likelihood, 
Fundamentally all these results are different aspects of the same thing and rest on the fact 
that the variation of the sum of squares of normal variates in standard measure is spherically 
symmetric, so that a hyperplane in the sample space “ cuts " the distribution in a spheri- 
cally symmetrie form of one lower degree of freedom. 


Extension to Curvilinear Regression 
22.21. The foregoing result can be extended without difficulty to the case when 

the regression is curvilinear. If 

Y=6.P,.+6:P,+..-+6,P,, 
where the P’s are orthogonal, then 
—ZHP;. 
AO 
and we have also, for the variance of b; when the z's are fixed, 

a; 


2 
var b; = Exp 


b; 


so that 
(b; — 8) VZ P} 
0: 
is distributed normally with zero mean and unit variance. Taking as our estimate of oj 
1 ni 
verme) Z(y— Y), 


(b — B) V (n —j —1) VE PF 
t= a T . . ° . (22.62 
VEY- Y E 
is distributed as ' Student's " t with » = n — j — 1 degrees of freedom. 
It will be observed that in this and the previous section we have not assumed anything 


about the distribution in z-arrays. We have merely supposed that for any given z, y is 
normally distributed with constant variance. 


8& = 


we see, as before, that 


Example 22.6 

Consider again the soil data of Example 22.3. We found, for the cubic term in the 
parabola, a coefficient of — 0-9189 x 1076. Is this significant ? 

Here b; — b; = — 0-9189 x 10° forj = 3; 

V(n —j — 1) = y(16 — 4) = 3:464. 
We have already found X (y — Y’)? = U, namely 
U = 0:050. 

We further require Y P? which has been obtained incidentally in the working of Example 
22.3 and is equal to 9-31525 x 101°. Hence 
.. 09189 x 10-* (3-464) 3-052 x 108 


—t 
0-2236 


= 4:3, 
This is highly significant. 
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Case when the Independent Variate proceeds by Equal Steps 

22.22. An important special case arises when the independent variate has values 
which are equidistant, as, for instance, in most time-series and in grouped data. If we take 
the interval between successive values of v as our unit, the variate-values may, by a suit- 
able choice of origin, be taken as 0, 1, 2, .. . » — 1. The various moment-functions 
n; entering into the expressions for polynomials, ete., may be written down once for 
all. Furthermore, this case lends itself to simpler summatory methods of forming the 
actual polynomial values and the residuals. 


22.23. For a set of values 0, 1, 2, .. . n — 1, we have 


Eee A ms 3 ee Captus iM zi 
2 CLE zoe etc. 
ai 
Thus— pı =4(n—-1), f= = Hs = 0, ete. 
From (22.38) and similar equations we then find 
Pia n $ 1 
T iz : ^ . . . (22.03) 
P TA Ha — Xps — M pe NASS 
: Ha a 12 


and so on. The polynomials may be obtained more systematically as follows :— 
We show first of all that 
n—1\ A 
xni Sp USED ga), 97 nwa, ene 22:04) 
2 ( j jr S 
where A’ is the jth terminal difference of P, and the z's range from 0 tom — 1. In fact, 
from Newton's interpolation formula, 


xul 
je vcn ME EP AS APP] 
p im ji 

and since the P's are orthogonal, 

Z(x-4.q—1)c09P, — 0, ge placa et 2280) 
Substituting from (22.05), we find for the term in 47 P, — 

Ai Ai 

 (, — peu-ucp =F (ats) — NETS RENS 2 
2 (e +9—1) Fie ait ae Be eter ET 


= — {yeti c 
a ar G+qaj\ ” 


Thus for all g from 1 to p we have 


0 =$ aio edi 


~ aar zu E 

` (ndig—11! i) A P « 

eren uU rra P ae 
whence follows (22.64). We now find functions obeying these conditions. 


Consider 
y =C (x + p)”. 5 5 : $ . (22.67) 
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This is a polynomial of degree p, and if for x = 0,1, . . . pit assumes the values Yo, - . + Yp 
we have— : 


() =0 ips yt yj (— yz (22.68) 
z) 02 D ; : . (22. 
y Zjb-i€-j 
for this also is of degree p and has the right values at v = 0, . .. p. Taking now 
— (n.—1)!1p.—3)! — 1-1 4 22.6 
n= (ta Py 0. 0. (22.60) 


we find that for x = — q 
P. 


yo = 0p +o y YAP we 


j=0 
=0(— 1} (p + gem yr(* 5. y EOS . (22.70) 
£j Jati 


Now from the definition of y this clearly vanishes for — x = q = 1, ... p, and thus 
(22.70) is zero. Comparing it with (22.64) we see that the conditions are satisfied if we 
give to y; the value of A’ of (22.69), i.e. 

wp, = —@=I=—"'_ (C yf 
eles ^" 
(n—j—10!( +3)! z 
=g REDE o . (22.71) 
a=- — ; 
The constant C is evaluated by the fact that the coefficient of X? in P, is unity, giving 
Æ P, =p! This gives 


1)? (n — 1)! 

Ga a? med ee Ce T (22.72 
Gp ( —p— 01 ri 
Finally, substituting in (22.65), we find 


BED one a)? UD) (0930.1 ul m 
ns DIGG -SIROP DD De.. 00553. 1)80(22 73) 


where by convention the term XU! is unity for j = 0. The first six polynomials are 


plea n > 1 

MY ex : 

P= Pt P. 

P,—Pi— € 1S pa 507 D ee zo . (22.74) 
P, = P; 5 ee 7) P: + 15n4 — n + 407 P, 

p-p- Ere 31) p, q 5n* — n + 329 p, 


. 5 (n? — 1) (n? — 9) (n? — 23) 
14784 
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e more values are given by Allan (1930), to whom the above derivation of (22.73) 
is due. 

Values of the polynomials up to and including the fifth are given in Fisher and Yates' 
Statistical Tables up to n = 52. 


22.24. We can now find an explicit expression for X P?. Since the polynomials 
are orthogonal we have 


ZP? = SF (a+ pP; 
which, by the argument resulting in (22.64), leads to 


e H 
xpi- : (n+p)! 4 DM 
i1 "cet mpm $ 


Putting q = p + 1 in (22.67) and (22.70), we have 


a) Ai 


y(- 9 = 0(— 19» = (- qe Gp +D Y (* 71) Pe 


£j 


whence, after a little rearrangement, 


(2! AP, (p * (n +p)! 


jl@—j—Diptot1] @t+)!m— Il 


and thus, substituting for C from (22.72), we find 


2 (p )* 2 
DH E (Sp Hil eee eee — p). , . (22.75) 


22.25. It is also possible to express the orthogonal polynomials in terms of central 
differences. We quote without proof the results (for details of which see Allan, 1930) :— 


P! appr, y OY- —D CRESS . (22.76) 


Ars (oa B= aig pp — C 
2c [sri - 0)! 
where Ek" = Iciin 0H Es S E n . . (22.77) 


The series is summed from j = 0 until 2j > p, when the denominator vanishes and (p — 3)! 
is written for J’(p + 3) to preserve the factorial notation. In practice the polynomials 
for particular examples are not determined from (22.73) or (22.76) but by the use of tables, 
or by summation from differences in the manner of Example 22.9 below. 


Example 22.7 


For the fitting of a regression line in the case of equidistant intervals various methods 
are in use. A choice between them depends on the length of the series, the order of regres- 
sion to which it is desired to go, and the computing resources at the investigator’s disposal. 
We will illustrate two methods in this and the next example. 


A.S.—VOL. II. 
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TABLE 22.2 

Fitting of Regression Line by Orthogonal Polynomials —Equidistant. x-intervals. 
(1) (2) - @) (4) (5) (6) 

: opulation 

Year. Variate. (anillo): 
P Y P, bPa | YsP. 
1811 —6 10-16 22 — 1l 99 
1821 -5 12-00 1l 0 — 66 
1831 —4 13-90 2 6 — 96 
1841 -3 15-91 —-5 8 — 54 
1851 —-2 17-93 — 10 7 11 
1861 -1 20-07 — 13 4 64 
1871 0 22-71 — 14 0 84 
1881 H 25.97 — 13 — 4 64 
1891 2 29-00 — 10 - 7 1l 
1901 3 32-53 — 5 — 8 — 54 
1911 4 36-07 2 — 6 — 96 
1921 5 37-89 ll 0 — 66 
1931 6 39-95 22 11 99 


In Table 22.2, column 3 shows the population of England and Wales (in millions) 
for the years shown in column 1. These are at ten-yearly intervals, and the variate-values 
in units of 10 with origin at the mid-point of the range are given in column (2). These 
are the values of P.. 

The corresponding values of P,, P, and P, are given in the last three columns. They 
may be calculated direct from (22.74), but are most conveniently taken direct from the 
Fisher-Yates tables. 

We find, for n = 13, 


2 YP, = 47477 

E YP, = 123-19 

= YP, = — 39-38 x 6 = — 236-28 

= YP, = — 374-30 x 12 = — 641-657,143, 


and, direct from the tables, 
EP? = 182, X P} = 2002, X Pj = 572 x 36, 
Z P} = 68,068 x Qg)*. 


Hence, from equations of the type b; = Le we find 
j 
b, = 2-608,626, by = 0-061,533,467, b, = — 0-011,474,359, D, = — 0-003,207,699 


and the quartic curve is 
y — 24-1608 = 2-6086X + 0-061,53 (X? — 14) — 0-011,47 (X? — 25X) 


— 7 xs 4144) IU L9 78) 


We can now find the residuals for each term in this equation. We find 


Z Y? = 8839-9389 
ZY = 314-09. 


— 0-003,208 (x: 


7$ 
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Hence the sum of squares of Y about the mean of Y, 


Z(Y — Y): = 1251-283, 
Thus we have :— 


Residual Sum of Squares. 


| Original variation . . . k 2 un 1251-283 


| Contribution of first term = b, Z (YP,). 1238-497 12-786 

Contribution of second term = b, Z(YP,). . 7:580 5-206 
| Contribution of third term = 6, Z(YP, . . 2-711 2-495 
| Contribution of fourth term = à, Z(YP,) . . 2-058 0:437 


For the variance of the residual elements we divide by the number of degrees of freedom 
(n —j — 1) and obtain 


| E 


Residual Sum of Squares. Divisor. | Residual Variance, 
12-786 | 11 1:162 
5:206 10 0-521 
2495 | 9 0:277 
| 0-065 


0:437 | 8 


t5 
S 


Population (millons ) 


J621 1841 1861 1881 1901 1921 " 


Years 
Fig, 22.2.— Cubic (full line) and Quartic (broken line) Parabolas fitted to the Data of Table 22.2. 


The fit is evidently a good one, as is borne out by the smallness of the residual variance, 
but we must sound a warning as to the use of this polynomial. For interpolation in the 
variate range it would probably suit very well ; but for extrapolation outside the range 
it is dangerous unless there is good reason to suppose that the polynomial has some theoretical 
basis (which is not so). It would, for instance, be most unsafe to try and estimate the 
population in 1960 by inserting X = 9 in equation ( 2.78). 
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Example 22.8 


In Chapter 3 it was seen that factorial moments can be derived by summatory pro- 
cesses. A somewhat similar method can be used to fit orthogonal polynomials. We will 


illustrate it on the data of the previous example. 


TABLE 22.3 
Fitting of Orthogonal Polynomials by Factorial Sums. 


So Sı S. 5, 
10-16 10-16 10-16 10-16 
12-00 22-16 32.32 42448 
13-90 36-06 68-38 110-86 
15-91 51-97 120-35 231-21 
17-93 69-90 190-25 421-46 
20-07 89-97 280-22 701-68 
22-71 112-68 392-90 1094-58 
25.97 138-65 531-55 1626-13 
29-00 167-65 699-20 2325-33 
32-53 200-18 899-38 3224-71 
36-07 236-25 1135-63 4360-34 
37-89 274-14 1409-77 5710-11 
39-95 314-09 1723-86 7493-97 

314-09 1723-86 7493-97 
— 


Tn Table 22.3 the column headed S, gives the value of Y. The next column, headed 
Sı, gives the sums of the values in the first column proceeding from the top; and so for 


the columns headed S, and §;. 
Now construct the quantities 


a= l A = 314-09 = 24-160,769 
n 13 
E PEA _ 2 (1723-86) _ o, 
^7 and The p sit) 
E 3! L 6 (7493-97) _ 
Aeae 3)7*7 — 3739 — 10410,204 
the general formula being 
Due GIIS 
; ^ WE) (EE) ; . 


Then obtain the quantities d 

a = a, = 24-160,769 

ai = Ay — a, = 5:217,253 

4, = à, — 3a, + 2a, = 0-270,749, 
the general formula being 


em p (p +1) (p — 1) (p) (p + 1) (p + 2) 
TX omma ho (21:3 


d, —. 


. (22.79) 


. (22.80) 


pes? 
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Finally put bo = ay = 24-160,769 
b =~ £ [a= 5 CE = 2-608,626 
nss (n — Dis — 2) as 2 see Beate 
the general formula being 
b= Crip "W-—0 tm Ee. EUNT 


Then the b’s are the coefficients of the orthogonal polynomials in the regression equation, 
The values we have found check with those of the previous example and the reader may 
care to work out b, and b, by the same method. 

This process is due to R. A. Fisher and avoids the direct caleulation of the values of 
the orthogonal polynomials. Its validity may be established by using equations (22.75) 
and (22.73), which give A 


ZY Py _ (2p !) (2p +1)! 
b, = DP Senta Gt 1)... (pty 2 ¥ Po) 
= (2p+1)! — 5 (HPs (p4)! (n-j—)!(-412Zyz... (e—j+1) 
(p !)? (n—1). . . (n—p) 5 G0? (p—3) ! GF) (n—p—1)!n.. . (n+p) 


The first part of the expression explains the coefficients in (22.81), the second part those 
in (22.80). The third part gives rise to (22.79) when it is remembered that the sums 9 
are expressible as sums of factorials (cf. 3.10, vol. I, p. 58), but the summation takes place 
from the top of the column. 


Example 22.9 

As a rule it is unnecessary to evaluate the polynomial at all the points for which data 
are given; but if the values are desired for comparison with observation they may be 
obtained by summatory processes from the differences. 

The terminal differences themselves are obtainable simply from the quantities a, of 
the previous example. For a polynomial of the first degree we have 


area . (22.82) 
Y =a + 3a. 
For that of the second degree, 
Aye eae 
(n — 1) (n — 2) 
6 5 B . (22.83 
AY = — — (a + 6a) em 
Y = a + 3a, + 625. 
For the third degree, ^ 
AY= 840 T 
(n — 1) (n — 2) (n — 3) 
Ary —. 00. (03 + 743) 
(n — 1) (n — 2) Jones x dete P (ppc 
AY, je ro bie falda) 


n—1 
Y =a, + 3a, + 5a; + Taz. 
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The formulae for higher degrees are constructed on analogous lines, the multiplying 
factors for successive differences being given by 
(p +1) (p +2)... 2p +1) 
(n — 1) (n — 2) . . . (n — p) 
and the coefficients of the a’s by 
Y 1 3 5 7 9 1l 


(E39 


AY 1 5 14 30 55 
AY 1 7 27 71 eto. 
ALY. 1 9 44 
ASY 1 ll 
AS Ys 1 


We leave the proof of these results to the reader. 
For instance, for the data considered in the two previous examples we found, for the 
parabola of the second degree, 
Y = 24-160,8 + 2-608,6X + 0-061,533 (X* — 14) 
a) = 24-160,769 ; a, = 5:217,253 ; a, = 0-270,749. 


Hence, from (22.83), 


60 
See a) ee 0-123,068 


Aus m 


A*Y 


(a; + 5a3) = — 3-285,499 


6 
n—1 
Y =a + 3a, + 5a; = 41-166,273. 
We then build up the polynomial values as shown in Table 22.4. The second difference 
0-123,068 is shown at the foot of column (2). Being a constant, it could have been written 


TABLE 22.4 
Calculation of Polynomial Values from Differences. 


(1) (2) (3) (4) (5) (6) 
Number of Second First Polynomial Observed Difference 
" Term. Difference. Difference. Value. Value. (5)-(4) 
1 — 1-808,68 9-863 10-16 0-297 
2 — 1931,75 11-795 12-00 0-205 
3 — 2-054,82 13-849 13-90 0-051 
4 — 2.117,88 16-027 15-91 — 0-117 
5 — 2-300,95 18-328 17:93 — 0-398 
6 — 2-424,02 20-752 20-07 — 0:682 
7 — 2-547,09 23-299 22-71 — 0-589 
8 — 2.670,16 25-969 25-97 0-001 
9 — 2-793,23 28-763 29-00 0-237 
10 — 2-916,29 31-679 32-53 0-851 
1l — 3:039,36 34-718 36-07 1-352 
12 — 3-162,43 37-881 37-89 0-009 
13 0-123,068 — 3-285,499 41-166,27 39-95 — 1-216 


all the way up, but to do so is a waste of time (and in practice, of course, we should not 
devote a separate column to it). The first difference is shown at the foot of column (3), 
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and the figures above it constructed by adding the second difference at each stage. The 
polynomial values themselves are compiled by adding the first differences to the value 
at the foot of the column, 41-166,27. 

We have also shown the observed values and the difference between polynomial and 
observed values. The sum of squares of the latter is 5-204, agreeing within the margin 
of rounding-up error with the value for the sum of squares of residuals found in 
Example 22.7. 

As an exercise the reader should work out the polynomial values for the third- and 
fourth-order polynomials and compare the sum of squares of residuals with the values of 
Example 22.7. 


Multiple Curvilinear Regression 

22.26. We considered the linear regression of one variate on a number of others 
in Chapters 14 and 15. There now remains the extension of our results to the 
curvilinear case. 

The extension is very easy to carry out when we remember that in multiple linear 
regression there is no restriction on the degree of dependence among the “ independent ” 
variates. In particular, some of them may be functionally related, and more particularly 
still, one variate may be a powerof another. It is thus clear that the process of fitting 
curved regression lines can be regarded as formally equivalent to that of fitting linear 
regressions. For instance, the fitting of 

Y =a) +4, Xı 4- a4 Xs + as Xs + a, X, +45 Xs 
is equivalent to 

Y=a,+a,X, +a, Xj +a, Z, +a, Zi +a; Zi, 
the latter being a particular case of the former where X, is the square of X, (and their 
covariation accordingly complete) and similar relations exist between X;, X, and X;. 

The case of curvilinear regression for a single variate, which has occupied the fore- 
going part of the chapter, could then have been treated by the methods of Chapter 15. 
We have discussed it afresh only because it is more easily dealt with by direct methods. 


22.27. In multiple regression analysis it sometimes happens that, having worked out 
a regression equation, we wish either to take account of a new factor or to remove one 
which appears redundant. To avoid the necessity of solving a new set of determinantal 


equations the following device is useful :— 
Consider the case of three independent variates measured from their mean 
Y =b, X, + b: Xs + bs Xs. ; T 5 . (22.85) 
In accordance with our general method the constants b are given by 
b, D(a?) +b: X (x, 2a) + bs X (i03) = Z (0i y) 
b, X (a, 2a) +b: Z (a3) + bs X (T: Xs) = Z (24) . (22.86) 
b, EX (zi Hs) + ba Z (Xa Xs) + bs & (25) = 2 (xy) 
Suppose now we replace the functions Z (xy) on the right by 1, 0, 0 and obtain the solutions 
bi =c ba =C bs =C; and similarly for replacement by 0,1,0 and 0, 0,1, 


the solutions being written 
b, = Cus Cray Cis 
ba = Cra, Cos) Cas P= . . . . . (22.87) 


bs = Crs, Cass. Cas 


168 
Then the solution of (21.86) is 


bı = ey, X (21 Y) + C12 Z (719) + Crs Z (zs y) 
b, = cia Z (£1 y) + Con Z (92 y) + Cos Z (2s y) 
b, = eus Z (2i y) + Cos Z (1089) + Css Z (s Y) 
The values of the c's are those we have denoted 


as is immediately evident on substitution. 
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| 


earlier in the chapter by determinantal forms, e.g. cj, = 42 /4®. 


. (22.88) 


aa cc T Mo ÉD ias: 


22.28. Now suppose that we wish to discard the variate x, From (22.86), with 


1,0, 0 written on the right, we find 


where (jk) stands for Æ (x; z,), and 

(11) 
(12) 
(13) 


A= 


There are similar expressions for the other c's. If the values of the constants when cv, 


| ay 
(12) 
(13) 


is removed are cii, C2, Cy, we shall have 


(13) seme 
(23) 0 
(33) 0 


(13) 
(23) 
(33) 


(12) 
(22) 
(23) 


. (22.89) 


. (22.90) 


; 1 , 1 
i=- m n | Tiap e o | eto ©. (22.91) 
where 
ted (11) (2) 99 l 
d'= | (23) — (22) | ; . (22.92 | 
Now we have | 
(TD (19) 1 (L8) Tbe «6 | 
(12) (22) 0 (32) M02) eT | 
Cats | (18) (23) .0 (13) (23) 0 | 
bas le (UL) EnO | 
A (12) (22) — 0 l 
(13) a (23) ORT i 
(12) (22) | | (11) (12) | | 
üs) G3) || ü3 Q3 1 
AA‘ 3 
Thus f 
Cis Cos C13 C33 — Cis Cos 
(up ee 
C33 C33 
(2) (28) |] (12) — (12) (12) 999922) TE CE REER (1) 
jas  G59.|02 Q2 (13)  (23)|| (13) — (23) 
AA’ 
—  (12)A 
cO UE 
PSOE EAE 3 5 . (22.93) 


a 
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Similarly 
" s. 
Cote... (28,94) 
33 
dcs E ea 22.95 
e ert . E . 4 . (22.95) 


This gives us the new c's in terms of the old. Denoting similarly the new b’s by primes, 
we have re" 


b, — b, = (e — Cn) Z (2, y) + (e — es) Z (9 9) + Crs X (sy) 


33 
Cis bs 
2n 
Hence we have 
pto C1 bs 
Cas Use eoe s Lo (20:96) 
ae Regedit 
C33 


expressing the new constants in terms of the old and the known constants c. 
Finally, the contribution to the sum of squares due to the variate 2, is 


b, E (wy) + b E (way) + bs Z (way) — b E(w, y) — b, Z (ey) 


= b, X (e y) + 2b, E (y) + ba X (ug) 
33 


33 


EU, QNS A Tre cc ese E E 


22.29. Generally, if there are p independent variates the equations for the 6’s are 


bE (e) by DY (tate) +... + yd (52) = Z (Y 21) 


ui 
b, E (2,25) + ba Z (te tp) H. -+ bp Z (xz) =E (y 2) 
Tf z, is omitted the equations become (p — 1) in number in variables b; . . . b, 4. Sub- 
tracting from these the first (p — 1) of the above equations we find (p — 1) equations, 
typified by j 
(b; —b,) Z (x, 24) + Ob) E (m) E +» + Uca bp) Z (ya 2) bp Z (0525) = 0 
(22.98) 

But these equations are the same as those for the coefficients cy, . . . Cpp with (b, — 6,) 
in place of c,,, etc., and — bp in place of cpp Hence 
MEAE 

—b, p 
Mi ui Epp a EE us gy (39:90) 


or 
Cpp 
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Similarly it will be found that 


Cip 
€u — Cu = FE 
D DLE v.s (22:100) 
Cip Cop 
Cg — Ga = Ee 3 
Tp 


with similar equations for the other c's. 


22.30. Somewhat similar results apply when a variate is added, If primes again 
refer to new coefficients when x, is added, we have, as above— 


, Cia b; 
b, —b x 
Cag 
2 
C, — C, = Cia 96 
n ilu Sa, . t 5 E . (22.101) 
qq 
Cig Cs 
Oy — Cg = 2^ E 


In order to use these equations to adjust the constants we require cj, . . . c;, and bg 


By writing down the equations satisfied by c,, . . . Cıp and subtracting the correspond- 
ing equations in cj, . . . Cip we get p equations such as 
(ei = e) Z (m 2j) +. -+ (Cip — Cap) Z (2525) = — Cig Z (a; 2). 


These are the same as the equations in 6, . . . b, with — Cig (a; z,) instead of Z (v, y) 
on the right, and hence 


ĉip — ĉip = — "x Z (2; 2). 
j21 
Thus, using (22.101), 


Spa. — - uis ty Poe oe Fa (927102) 
qq 


ji 
The last of the equations satisfied by c, is 
Cig Z (ou) +- -© + e E (2a 2p) + Cu Z (2) = 1. 


Substituting for c;,, etc., in terms of C we get 


p 
Ca fz (x2) — 2 e; Z (at; q) Z (p zo} E12. 77 (22:103) 

j k=l 
This gives c' and c'ig . . . C'pg are derivable from (22.102). The other constants then 
result from (22.101). 

Cochran (1938a), to whom this proof is due, says that the elimination of two variates 
is best carried out in two stages of one each; that where one variate is eliminated the 
method is quicker than re-solving the regression equations, except where there are only 
two independent variates in the first instance ; and that if two variates are being eliminated 
the method is quicker if the original number of independent variates is six or more. For 
the addition of variates the method is in all cases more expeditious than re-solving the 
regression equations. ¢ 
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Example 22.10 (Cochran, 1938a) 


In a study of the effect of weather factors on the number of noctuid moths per night 
caught in a light-trap, regressions were worked out on X, (minimum night temperature), 
X, (the maximum temperature of the previous day), X; (the average speed of the wind 
during the night), and X, (the amount of rain during the night). The dependent variate 
was log (1 + n), where n was the number of moths. 

It was subsequently decided to investigate the effect of cloudiness, measured on a 
conventional scale as the percentage of starlight obscured by clouds in a night sky camera. 
'This is the new variate X;. 

The quantities cj, for the first four variates were :— 


X, X; X; A 
X, — --0105,423,56 — — 0-041,946,20 —— — 0-096,007,00 —— — 0-018,490,96 
24. x + 0-086,038,69 + 0-033,172,71 + 0:012,903,58 
X, Me, es + 0-572,652,01 + 0-008,116,62 
55 Ms SA A + 0-062,275,32 
and the sums & (a; xs) were 
* X (2, 2s) = — 4807, ZF (ays) = +0206, Z (£2) = — 0-5446, 
E (z,2,) = — 5:42, Z (a3) = 7-87. 


We then find from (22.103) 
+ 0-210,133,14, 


o. 
[| 


55 
and from (22.102) 
9: — 4 0.369,198,24 — 95 = — 0-133,872,86 — 9" = — 0-118,533,74 
C55 €55 C55 
Cis — 4 0-249,298,91, 
Css 
so that the new c's are given by (22.101) as 
d Xe DG X. Xs 
X,  0-134,066,25 — 0:052,332,16 — 0-105,263,03 -+ 0-000,849,84 + 0-077,580,79 
X TR + 0-089,804,68 -+ 0-036,507,20 + 0-005,890,52 — 0-028,131,12 
xe ER jai ++ 0:575,604,43 -+ 0-001,907,12 — 0-024,907,87 
* D ET ae En 4-0-075,335,08 -+ 0-052,385,96 
Ae i the zn $24 + 0:210,133,14 


The original regression coefficients were 


= + 0:198,140,7 b, = + 0-038,528,4 b, = — 0-508,649,2, 


ll 


= + 0:031,848,2. 
5 
We now find . b= ME {cjs E (x y) } 
j=1 
= — 0-227,149,6, 
and from (22.101) we then have 
b; = + 0-114,277,5 b, = + 0-068,937,6 b, = — 0:481,724,3, 
b, = — 0:024,779,9. 


As usual we have retained more figures than are necessary, in order to avoid cumulating 
a errors and to facilitate the detection of computational slips. 
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22.31. The constants c found in the foregoing method have a further use: they 


give the standard errors of the regression coefficients and provide some of the functions N zl 


required in more exact tests based on the ¢-distribution. If, measuring y about the mean, 
we have 
Y =b, X, 45 X. b... +b, Zp 

then there are p equations of the kind : 

Dhe y) b Fai HbA (mt) Fire + bp Z (a, £p); 
and thus, recalling the definition of the c's, we have 

b; = Cy X (214) + Cre Z (try) +... cip Z (£p Y). 
Thus, for fixed values of the as, 


var b, = vary ( > Cig Cik 2; ra) 
jk 


mc vM CUT. . C. (22104) 
and so for the other b's. 
For large samples var y may be taken to be the estimated variance 
1 
c c pean dat 
casi (y — Y?) 
If the sample is small and it is desired to make a more accurate test, then we have, 


by an extension of 22.21, that 
Ies (b; — B) V (n — p — 1) : J » . (22.105) 


V X (y —Y'? Vey 


is distributed in “ Student's" form with » = n — p —1 degrees of freedom. 


22.32. As a final comment we may emphasise that regression equations are only 
polynomials fitted to the means of arrays, and consequently that if the scatter about 
those means is substantial they are not very reliable as estimators (though they may be 
better than other methods). The comment would hardly be necessary were it not for a 
tendency to use the equations somewhat uncritically for purposes of prediction. The 
point assumes even greater importance when attempts are made to estimate the dependent 
variate for values of the independent variates outside the range on which the regressions 
are based; or again, if the observations are distributed over time so that the population 
may be changing while the sample is being drawn. The technique of regression analysis 
is undoubtedly useful in many fields, but—as with many other statistical techniques— 
the careful investigator will apply it with a certain amount of self-discipline. 


NOTES AND REFERENCES 


The theory of curvilinear regression was studied by Karl Pearson (1905). Orthogonal 
polynomials had been considered, and the essential problems solved, by Tchebycheff as 
far back as 1857, but their use in statistics was not fully appreciated until about sixty years 
later. Pearson gave in 1921 the general formulae for fitting curved regression lines up to 
the fourth order. Neyman (1926) pointed out the elegance of the determinantal approach. 

From about 1920 onwards there may be discerned two main lines of development. 
The Scandinavian school, led by Wicksell, has developed the analytical theory of regression 
—see Wicksell (19175, 1933, 19345) and a useful memoir by W. Andersson (1932). The 
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second line, followed by Fisher, Aitken and others, has been concerned with the fitting of 
regression curves to arithmetical data and exact significance tests—see Fisher's papers of 
1921b, 1922b, 1924b, 1926a, a paper by Allan (1930), and three papers by Aitken (1933a, 
b, c). The literature on orthogonal polynomials is now very large. 

For some illustrative material, see K. Pearson (1905), Andersson (1932), and Pretorius 
(1930). See also references to Chapters 14 and 15. 


EXERCISES 


22.1. Show that the regression of y on the variance of x (the scedastie curve) is 
given by 


Sayeed i D' g (X) Di~ g | 
roA a ae 2-25 ee ) eo sea 7d) zm 


8-0 


j 
where Po 
1 41 
(Wicksell, 19345.) 


22.2. Show that if the regression of y on the mean of z is linear, then from (22.11) 


is a linear function of ¢ (tı) and zy ET t,). Hence that 


Kjj Kao = Kn Kje1,0 


(Wicksell, 19345.) 
22.8. Show that if the marginal distribution of a bivariate distribution is of the 


Gram-Charlier Type A: 
f=a(a){l+a,H,+a,H,+...} 


the regression of y on x is 
; E Shatin 


1+ Dra y H; (X) 


ja 


y 


(Wicksell, 1917b.) 


22.4. Transforming the orthogonal polynomials of (22.74) to a new variate 
EINE ER note that P, — £P5-1i8 8 numerical multiple of Pj, say ÀP,..,. Show 


that 


and deduce the recurrence relation, 
Ba (p — 1) m- p-p 
PoPaa — “4 @p —1) Gp — 38) 
(Allan, 1930. The relation is due to Tchebycheff.) 
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22.5. À regression line 

Y =a,+a,X +a, X? +a, X3 +a, X* 

is fitted to normal data and the number of observations N is large. 
“2 

between the variates and c = 2 (the moments referring to the z-variate), show that 


Tf r is the correlation 


var dy = A (45 + 30c? — 8c* + c*) (1 — r°) 


vara, = Moms (15 + 30c — 15c? + 4c?) (1 — r°) 
2 


var d, = Na (4 — 3c + 3c) (1 — r?) 
2 


vara, = ea (1 + 4c) (1 — r°) 
2 

ever Yep? 

YR (1 — r°’). 


var Q, = 
(Andersson, 1932.) 


22.6. In the notation of 22.31 show that 
coy (b, bs) = c1 var y 
and hence show how to test the difference of two coefficients in a regression equation. 


corresponding 


22.7. Show how to derive a test of the significance of the difference of 
based on the 


regression coefficients in two equations derived from independent samples, 
result of 21.26. 


nel rs x. 


(i 


CHAPTER 23 
THE ANALYSIS OF VARIANCE—(1) 


23.1. At various points in this book we have encountered in different guises the 
result that the sum of squares of a set of observations about their mean can be represented 
as the sum of two independent sums of squares, each of which provides an estimate of 
the parent variance; and that their ratio provides a test of homogeneity, at least when 
the parent is normal. We now proceed to study in more detail a method of statistical 
analysis with considerable generality which springs from this result. In view of the com- 
plexity of the general case we shall begin by considering simpler cases under somewhat 
restrietive conditions and shall extend our results stage by stage. 


One-way Classification 
23.2. Suppose we have a set of variate-values divided into p families : 


LM UIT RM a. 
Xr Das on d aoe Sine 
zu DET depu u yp 


Denoting by # the mean of the whole set and by &, the mean of the values in the jth family, 
we have the identity 


=) e-t A Ga . 0M) 


since the cross-product term s. (x; — dj) (Z, — 5) vanishes. We may also write this as 


ij 
D (zy — #)? = SE (zy — £)* + 2 n; (č; — 7)’, EO «TORIA 
ij ij j 


where n; is the number of members in the jth family. 

Tt will also be convenient, from the point of view of a later generalisation, to write 
the mean of the jth family as x; and that of the whole as z,,, the periods in the subscripts 
showing which factor is being averaged. We have then the alternative form 


D (zy — 2...) =D) ey — z4)? t» ny (zg =a.) . (23.3) 
ij j 


ij 


23.3. The problem we shall discuss in connection with families of values of this type 
takes some such form as the following : the members of each family are randomly chosen 
from some parent population corresponding to that family. The populations themselves 
are, as a rule, defined by some prior system of classification given among the data of the 
problem, e.g. they might be different varieties of wheat, the z's being the yields of the 
varieties grown under similar conditions, or they might be defined by income levels and 
the a’s the expenditure on food of a sample chosen from the different income groups. We 


now ask: is there any evidence that the factor measured by 2 varies significantly from 
175 
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family to family ? Alternatively, can the data be regarded as homogeneous, i.e. as emana- 
ting from populations which are identical so far as concerns the factor measured by v? 
Further, when the question of significance is decided, how can we estimate the variation 
of x in families or groups of families, and how can we estimate the magnitude of any 
differences which exist ? 


23.4. We will assume, until further notice, that within each family the variation 
is normal with variance v, and that v is the same for each family. In later sections we 
shall endeavour to remove these rather restrictive conditions. On our present hypothesis 
the populations corresponding to the different families can differ, if at all, only in their 
means, and our first question is whether the sample values afford any evidence of such 
differences. 

Let us take as our hypothesis that the parent populations have a common mean m. 
Then we recall the following facts :— 


1 : IC ^ : 
(1) The sum am tis — z,)* is distributed in the Type III form of z? with 
N — 1 = S (nj) — 1 degrees of freedom, that is to say as the sum of squares of N — 1 
independent normal variates with zero mean and unit variance. 


(2) In any given family 2, «i x is distributed normally with unit variance about 


mean m, and is independent of the sum Z (x; — x4)? which is itself distributed as x° 
i 


with n; — 1 degrees of freedom. 
Since on our hypothesis the observations may be regarded as a single sample from 
the same population, it follows that 
} 


=D, len — z,)* is distributed as y? with N — 1 df. | 


$1 

D (x — x)? 5 » Z(ng—1-N-—pdf (23.4) 
ij 

1 yn; (x4 —2..)* fi » p—ldf. 

v3 


The only statement requiring any proof is the last. It may be proved directly (see Exercise 
23.1), but we shall deduce it as the corollary of a general theorem due to R. A. Fisher which 
will often be required in this chapter. 


23.5. Suppose we have q variates x, . . . x, which are independently and normally 
distributed with unit variance about the same mean, which we may assume to be 


zero. Put 


a 
jou aTe pog qe ets (23.5) 


If we choose the coefficients 2 so that 


2 = = 
1 T 


then each ¢ is distributed normally with unit variance independently of the others. There 


— 
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are q? coefficients 2, and the equations (23.6) impose 4g (q + 1) conditions on them, so that 
the A's can always be found in a multiplicity of ways. In effect they correspond to the 
rotation of orthogonal co-ordinate axes in a q-dimensional space. 

Now suppose that we have h linear functions of the a’s, 5, . . . Ca (h <q) whose 
coefficients obey the orthogonality relations (23.6). These À variates are then distributed 
independently, normally and with unit variance. 

It is now possible to find g — h further variates £,,, . . . ¢, which are orthogonal 
among themselves and to £, . . . ¢,. Geometrically this is evident from the possibilities 
of rotations in the q-way space. Algebraically it follows from the consideration that if 
qh of the 2s in (23.6) are known, q (q — h) are unknown, and the number of conditions 
they must obey is 

3a (q - 1) — 4^ (A+ 1) — $(g — 9) GQ +h +1), 
so that values of the unknowns can be found in at least one way if 
$(g+h+1) <q 
or h+1 <q. 

Now suppose we express a sum of squares of q normal variates with unit variance, 
say A, as the sum of two quantities B and C ; and suppose that B is distributed as the 
sum of squares of h independent normal variates with unit variance which are linear 
functions of the variates entering into A. Then we can find q — such variates inde- 
pendent of the first À, and C must be their sum of squares. Further, the distributions 
of B and C are independent. By an extension of the same argument, if 

A=A,+4,+... +A, : , : . (23.7) 


A is distributed as y? with » degrees of freedom, A, with v, . . , Ay_, with vp; and 
if the variates entering into A, . . . 4; .; are mutually independent and are linear functions 
of those entering into A, then A, is distributed as y? with v, degrees of freedom, where 


=n +r te. Hk . A T . (23.8) 
and Aj is independent of Aj, . . . Aj 


23.6. As an extension and kind of converse of this theorem we have the result, due 
to Cochran, that if A, . . . Ap are distributed as y? with », . . . », degrees of freedom, 
and their sum A is distributed as y? with » = Z (»;) degrees, then 4... . A, are inde- 
pendent. We will prove this for the case k = 2, the more general result following in a 


similar way. ý : 
Tf the characteristic function of A, and A, is ¢ (t; tz), we have, by hypothesis, 
1 
$ (4, 0) = (1 — 2i)" 
0,4) = Ip d 
$05 = Som 


1 
and 9 (t, t) = EEE 


1 
Hence 4 (t, t) = $ (t, 0) $ (0, t) ~ (1 — i? 


and thus ¢ (t, 0) and ¢ (0, #) are both divisible by a factor in (1 — 2it)~1 and no other 
A.8.—VOL. II. a 


B 
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factor in t because of the symmetry of ¢ (t, t). These factors are identified by 4 (4,0) — 
and ¢ (0, t:) as (1 — 2it)-** and (1 — 2it)**, and hence 


| 
¢ (to ta) = $ (ts, 0) (0, 1), | 
or A, and A, are independent. | 
23.7. Let us now return to the statements in (23.4). The sum 1 E (ey — e, )* is 
1 
distributed as y? with »=N—1. The sum -Z (xy — vy)? is so distributed with | 
=N — p. Further, the quantities z;; — x, may be transformed to N — p independent | 
normal variates which are linear functions of the variates entering into the first sum. It | 
follows from 23.5 that because of the identity (23.3) the third sum Z Zn, (u,; — z,.)* is | 
distributed as y? with v, = (N — 1) — (N — p) = p — 1 degrees of freedom, and that 
independently of the second sum. | 
Thus we may exhibit our break-up of the total sum in the following form :— | 
ry 


TABLE 23.1 P1 
Form of Analysis of Variance for One-way Classification. 


Sum of Squares. d.f. Quotient. | 
| re Ets 
Of family means about the mean of the E 1 = ee RAS = 
whole . . E n } nu) (24 — 2.) p : p= 1 Zimi (24 — 2.) 
Of individuals in families about the 1 "umi E =P af E 
respective family mean. . 2i fey a) Dun y: = Boon» 


ii 5j 


Of individuals about the mean of IS 2 (oq =w.) NPA 
i 


(wey — 2.. 
whole . j| 


We note that the sums of squares and the degrees of freedom in the first two rows sum to a 

those in the third row (though the quantities in the quotient column are not additive). E 

This is the origin of the expression *' analysis of variance," though, to be accurate, it is the N 

sum of squares of the total which is analysed. | 

To avoid cumbrous phrases we refer to the sum of squares of family means about 1 

the mean of the whole as the sum of squares “ between families," and to that of individuals | 

about the respective family-means (for the time being) as “ residual.” We shall also speak | 

of total sum of squares and total mean with the obvious significance, and denote degrees i 

"of freedom by the initial letters “ d.f.” * 


23.8. Since the mean value of y? with » degrees of freedom is v, the quotients in 


* The need has been felt for a word to denote “sum of squares about the mean”. Professor 
Pitman has suggested the word “ squariance ”, though he seems to feel that this leaves something to 
be desired. In my own notes I use the word “ deviance " but have not ventured to introduce it into 
the text. 
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(23.1) are all unbiassed estimators of v, the parent variance. Only the first two, however, 
are independent. We recall that the ratio 


N — p Zn; (x; — zx..)? : 
z = 3 log ee v 5$ (289 
p—l Z(n —az* a) 
is distributed in Fisher's form, which is independent of the variance v. This distribution 
accordingly provides a convenient test of significance in the normal case. 


Example 23.1 

Let us consider the application of the foregoing theory to a simple example which 
has been chosen to reduce the arithmetic to a small amount, The following shows the 
lives in hours of four batches of electric lamps :— 


Batch 1: 1600, 1610, 1650, 1680, 1700, 1720, 1800. 
Batch 2: 1580, 1640, 1640, 1700, 1750. 

Batch 3: 1460, 1550, 1600, 1620, 1640, 1660, 1740, 1820, 
Batch 4: 1510, 1520, 1530, 1570, 1600, 1680. 


We know that the batches were made from four different specimens of wire, but were other- 
wise made under identical conditions. (This, of course, over-simplifies the problem as it 
is encountered in practice, but will serve for purposes of illustration.) The question is, 
do the batches differ among themselves in length of life ? If so, we suspect that the quality 
of wire is varying materially, and if the lamps are to be standardised as far as possible the 
quality of wire must be made more uniform from batch to batch before manufacture is 
undertaken, The numbers in this example are small, but not much smaller than would 
be desirable in practice, owing to the expense and time involved in testing a lamp by running 
it until it burns out. 
The sums of v and x? for the four batches will be found to be— 


| 
Number in Farapa] Z (a) Z (x?) 
Batch 1 7 11,760 19,785,400 
52 5 8,310 13,828,100 
PIRS 8 13,090 21,503,700 
» 4 6 | 9,410 14,778,700 
TOTALS < v 26 | 42,570 69,895,900 


Thus for the mean life of lamp in the four batches we have 11,760/7 — 1680; 
8310/5 = 1662; 13,090/8 = 1636-25; 9410/6 = 1568-33. These certainly differ, but is 
the variation such as cannot have arisen by mere sampling fluctuations ? 
We find 
az, = 42,570/26 = 1637-3077. 
Thus A 
Z (z4 — 2) = Xai, — Na? 
= 69,895,900 — 69,700,189 
= 195,711. 
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We also have 
Zn (y — &,,)2 = E (nj) v. — Na 
i 


— 44,300. 
The analysis then takes the form— 
] 
Sum of Squares. d.f. Quotient. 
Between batches. . + + - 44,360 3 14,787 
Residual 1^2. . 45 309 s 151,351 22 6,880 
TOTALS: S ice ae) 2. 2 195,711 25 7,828 
We have 
14,787 
= flo 2 = 0:383 
z = blog. 0? 


$5; = 3, v, = 22. 
The 5-per-cent. point for these degrees of freedom is seen from the tables to be 0-5574. 
The observed value is therefore not significant, and we conclude that, so far as this test is 
concerned, there is nothing to throw doubt on the homogeneity of the group. 

Having decided, provisionally at least, to accept the hypothesis that the data are 
homogeneous, we may ask, what is the best estimate of the parent variance ? Our analysis 
has given three different estimates, viz. 14,787, 6880 and 7838. It seems natural to use 
the last, which depends on the greatest number of degrees of freedom. 

With this value we find for the variance of the mean of samples of n, 


7828 — 88-48 
n n 


"The greatest difference of means observed is that between the first and fourth batch, 
“1680 — 1568-33 = 111-67. The standard error of this difference is 
88:48 y (+ + 3) = 49-2. 
The observed difference is rather more than twice the standard error, but we cannot con- 
clude that it is significant on that account. In fact, we have picked out the greatest differ- 
ence for examination from the six possible comparisons of pairs, and the distribution of 
the greatest difference must have a larger standard error than that of a difference chosen 
at random, which is what we have found. Nevertheless the fact that even the greatest 
difference is only slightly in excess of twice the standard error affords some general evidence 
in support of the hypothesis of homogeneity. 

We may also note that if a more accurate test of the difference of two means is required 
the t-test may be invoked ; but here also we must remember that we are testing the greatest 
of a set of differences. Where there are only two families concerned, the analysis of variance 
reduces to the t-test for the difference of sample means when variances of the parents are 
assumed equal. 


23.9. Suppose now that in the case of one classification we have applied a test by 
means of the analysis of variance and have found that the hypothesis of homogeneity is 
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unacceptable, or, in plain English, that the parents do differ. Let us then consider the 
alternative that the populations are still normal and that they differ in their means but 
not in their variances. 

At first sight this may seem a highly artificial assumption to make, for if the popula- 
tions differ in their means it is not unlikely that they may differ in other respects. This 
is undoubtedly so, but if there is serious possibility of difference in variances their homo- 
geneity may be discussed separately by means of tests we shall consider in Chapter 26. 
Apart from this, there often arise in practice situations in which approximate equality of 
variance is plausible on prior grounds. For instance, we may be testing the effect of 
manuring on cereal yields, and it is reasonable to suppose that if the manure exerts any 
effect at all it will increase all plants of the same variety to about the same extent—that 
it will, in fact, displace the location of the distribution of yields without affecting 
its dispersion. 


23.10. The question we have now to consider is whether we can make an estimate 
of the common variance of the populations. A little thought will show that we can. The 
reasoning which led to the conclusion that the residual sum of squares is distributed as 
vy? with N — p degrees of freedom remains unchanged, so that the residual quotient in 
Table 23.1 continues to provide an estimator of v. The other two no longer do so. Con- 
sider, in fact, the sum of squares between families, and let the mean of the jth family be 
cm. Then we have 


T (z4 — 7.) =E PEREN —m,-—(z,—m.)-H-m,-—m.j* 
—-EZmn(z,—m,—(x,—mJ)*- ze (m; — m,)* (23.10) 
d 


Here m,, is the mean wet m, and hence x; — 5; has the mean v, — m,. Thus 
Snj{a,; — m; —(w,, — m,,)}* is distributed as vz? with p — 1 degrees of freedom and 
E En (uz —2,,)? = (p — 1)v +2 nj (mj —m,,)*. - . (23.11) 
Not unless m ; = m, —that is, all populations have the same mean—does the expression 
on the right reduce to (p — 1) v, and hence the quotient between families give an unbiassed 


estimator of v. In other cases it is greater. 
Similarly, 


E» (xj = c.) = LR (ey — m4 — (z. —m))* E (mj — m,.)* 
1 ij Dy] 
F SAE ean m emm) aa +» . .  « (28.12) 
j 


The expectation of the difference of the two terms considered in (23.11) and (23.12) con- 
firms that the residual sum of squares provides an estimator of (N — p) v. 


23.11. A comparison of the formulae we have already reached and those of section 
14.31 will show that the study of intra-class correlation is very closely related to the analysis 
of variance. It is an interesting exercise to derive the z-test directly from the sampling 
distribution of intra-class r given in equation (14.110) (vol. I, p. 362) and vice-versa. 


Two-way Classification : 2 
23.12. We proceed to the case when the variate-values belong not to one of a single 
set of families but to two, say A and B. In the first instance we shall consider the situation 
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when there is only a single value in the jth class of A and the kth class of B. Our sample 
may then be set out in the tabular form : 


Crass B 


(23.13) 


Crass A 


This is not a contingency table. The numbers x; are variate-values, not frequencies. 
“As usual, æ, signifies the mean of values in the class A, and x,, the mean of values in the 
class B,, x,, being the mean of the whole. 

We have the algebraic identity 


c D (xg — 25, — £g + m.) + Ue (5, —2,.)* +PZ (o, —2.)* (23.14) 
jk k 
the cross-product terms vanishing on summation in the usual way. 


23.13. We are interested in the variation of the z's according to class membership. 
Let us take as our hypothesis that the pg values are homogeneous, that is to say that they 
all emanate from (normal) populations with the same mean m and variance v. In such 
a case class-membership exerts no influence on variate-values, and the observed differences 
are pure sampling effects. 

The expression on the left in (23.14) is then distributed as vy? with pg — 1 degrees 
offreedom. The mean z;, is distributed normally with variance v/g and thus X g (xj, — «,,)* 


is distributed as vy? with p — 1 d.f. Similarly, Z p(x, — x,.)* is so distributed with 
k 


q—1d4. Finally the remaining term on the right is distributed as vz? with (p — 1) (g — 1) 


(p —1)(g — 1) 


d.£; for each term is normal with variance v, since 


1 1 1 1 1 
Qu — tj, — Ce X, = X 1-1-2344) -z (2-3) 
Tear dale «( q D [gU pq 


ito pail 
— Em, 2-x)*xA 0z5j, m zk 
"(3 p» pa” VPN 


4 


TWO-WAY CLASSIFICATION 183 


so that the sum of squares of coefficients on the right is 

ann (ez) SUN epa) (ooo) 

p= DG= Lgs (25 o - n (£2) 
{ Td | ) Pq oy) + (pq)* 

eB ae) eh) NUT : (23.15) 
PY 
Thus, since there are p + q — 1 linear relations connecting the pq quantities 
Wy — Tj, — Ty TU. 
their sum of squares is distributed as vz? with pg — (p +g — 1) = (p — 1) (g — 1) degrees 
of freedom, which checks against the mean value of the individual square given by (23.16). 
We may thus analyse the variance in the following way :— 


TABLE 23.2 
Form of Analysis of Variance for Two-way Classification with One Member in each Subclass 


Sums of Squares. d.f. Quotient. 
Between A-classes q Z (zj. — 2,.)* p—1 port. E 
| * 4 
Between B-classes p Z (ek — e, q—1 PT ZG — a.) 


1 
Residual . . . D>, it -zj — 2% +2.) -)@-)) 5—1)49-1) 
Tk 


PX — qj, — &,k d 2,.)* 
jk 


TOTAIS . . D En KD pq—1 
i Tk 


The sums of squares and degrees of freedom (but not the quotients) are additive as 
before. It follows from the theorem of 23.6 that the three constituent sums are inde- 
pendent. Each quotient provides an unbiassed estimator of v. 


23.14. Our use of these results proceeds by an easy generalisation of the method 
exemplified in Example 23.1. We take as our hypothesis the supposition that all samples 
are from normal populations with identical mean and variance. Comparison of the esti- 
mates in the quotient column then provides a test of significance. If the hypothesis is 
rejected we may examine the alternative that means are different but variances identical 
throughout, in which case we shall find that the residual still provides an estimate of the 
variance, provided that an important additional assumption is made. 


Example 23.2 


The following data (Daniels, Supp. J.R.S.S., 1938, 5, 89) show the weight in grams 
of 95-yard lengths of wool thread from 100 “ends” being spun on four bobbins, 25 ends 
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to the bobbin. We are interested in two factors, the variation between bobbins and the 
variation in the 25 ends on the same bobbin, according to their position. 


TABLE 23.3 
Weight in Grams of 100 95-yard Lengths of Wool Thread spun on Four Bobbins. 


Bobbin Number. 
End Number. 
3 

1 7:50 7:23 7:50 7-53 29-76 

2 7:52 7-81 7-77 8-05 31-15 

3 7-70 7-94 7:83 8-16 31-63 

4 7:93 7:94 7:96 7716 31:59 

5 7-78 7:89 8-02 785 31-54 | 
6 7-73 8-23 7:99 8:14 32.09 

tj 8:07 8-27 8:25 8-26 32-85 n 
8 8-01 8-54 8:24 8:54 

9 8:22 8-24 8:37 8:10 

10 8:24 8-35 8-43 8:15 

11 8:17 8.29 8-46 8-38 

12 8-09 8-54 8:33 8:47 

13 8-11 8-45 8-27 8:38 

14 1:96 8:43 8:24 8:60 

15 8-09 8-47 8-12 8:46 | 
16 8:04 8:33 8-14 8-43 | 
17 778 8-47 8-19 8-57 

18 8-11 8-63 8:36 8:38 

19 8:7 8:31 8:31 8:16 
20 8-12 8-31 8-47 8-41 
21 8-13 8-10 8:19 8:27 
22 8:01 8-01 8:37 7-96 
23 8-17 1-92 827 8-08 
24 8-05 8-27 8:07 8:16 
25 7-91 7-92 8-28 8-52 

Lua "Torars . 199-61 204-89 204-43 205-76 814-69 


It simplifies the arithmetic if we take a working mean at 8-00. The total sum of 
squares about this mean is then found to be 
© (č)? = 9:3829, 
and we have also 
E (cx) = 14-69. 
Hence 
E (xg — a,,)® = 9-3829 — (0-1469) (14-69) 
= 7-224,939. 
The means of the four bobbins are 
i 7:9844, 8-1956,.8-1772, 8-2304. 
With the same working mean we find for the sum of squares 
Z (2.4)? = 0-122,986,72 ; 
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and hence 
pE (£p — x,,)? = 25 (0-122,986,72) — (0-1469) (14-69) 
| — 0:916,707. 
The means of the four ends of corresponding position on the four bobbins can, of 
course, be found from the totals in the last column of the table, but it is simpler to find 
X (qu, — qv,.)* and then divide by q°. We find 


Z(u—zx)- oe — (0-1469) (14-69) 


= 4-637,814. 
The continual appearance of the factor (0:1469) (14:69) = Na? is to be noted. The 
quantity is best computed once for all at the outset. 
The residual sum of squares is then obtainable by subtraction, and we have the 
following analysis :— 


TABLE 23.4 
Analysis of Variance for the Data of Table 23.3. 


Sums of Squares. d.f. Quotient. 
Between bobbins HE wmm 0-916,707 3 0:3056 
Between ends. . . + - 4:637,814 24 0-1932 
Residual 2. «xs 1:670,418 72 0-0232 
2 zl 


Toraris . . -. - 3 7:224,939 99 0:0730 


The variation between bobbins and that between ends are both significant—the ratio 
of the corresponding quotients to the residual quotient is so big in each case as hardly to 
require the z-test. We are led to suspect that the variation between bobbins, small as it 
is, cannot be a chance effect, and it looks as if bobbin number 1 is not getting its fair share 
of thread. Similarly, the weight of thread seems to be dependent on whereabouts the 
thread is spun on the bobbins, and an inspection of the original data suggests a systematic 
variation as we proceed along the bobbin from end number 1 to end number 25, with a 
possible maximum in the middle. If the manufacturing process is to be standardised as 
much as possible, we should have to examine the reasons for the shortage of weight on 
the first bobbin and for this systematic effect of position on the bobbin. 


23.15. Suppose now that, as in the example just given, the hypothesis of homo- 
geneity is rejected. What interpretation can we put on the residual quotient ? Let us 
assume that each observation comes from a normal population with variance v, but that 
the parent mean of the subclass A; Bj, is mj, these quantities varying from one subclass 
to another. Is the residual quotient an unbiassed estimator of v ? In general the answer 
is “no”, but there is an important class of case in which it is affirmative. 

Let m,, be the mean of the g values of mj, in the class 4j, m, that of the p values 
m B,, and m,, the mean of the whole set of m’s. Then we may write 

Lip = My, ci - . . . . . (23.16) 
wa zy = my, + j> etc. : : $ . . (23.17) 
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Then 
ES (xj, tj, —t , 2, ) —E 2 (my —m,, —m 4m, £j —5,, —5 ne -8,.)* 

=E X (mj, —m;, —m ,4-m, )*--E X (E, —5,, —Ex+8..)%, (23.18) 
the product term vanishing as usual. The second term on the right is equal to 
(p — 1) (q — 1) v, for the Ẹs are distributed with variance v about zero mean, so that the 
term in question is the residual sum of squares in a p X q two-way classification of a homo- 
geneous sample and hence has the stated expectation. Thus we have 


HX(z,—cx—m.-cm)!—X(m,y—m,-—m-m.)* -c-(p—1(g—1)v (23.19) 
The residual quotient will then provide an unbiassed estimator of v if and only if 
mg — Ms, — ma d m,, = 0. DS x : . (23.20) 


23.16. Now suppose that x; is made up of three parts which are additive, viz. 

(1) the effect of the class A,, say a; ; 

(2) the effect of the class B,, say bp; and 

(3) a residual £j; which is normal and has zero mean. 
This kind of hypothesis will recur frequently. It amounts to an assumption that there 
is in x, an element a; which affects alike all members of the class A; but varies from one 
A-class to another ; an element b, which similarly affects alike all members of B, but varies 
from B-class to B-class; and a third component representing random variation which, 
apart from the sampling factor, is the same for all subclasses A; Bj. We then have 


Xy = A; + b, + Sy : d ‘ : . (23.21) 
and 

mg, = a; + b, 

m, — a; +b, 9c 

Up Su . (23.22) 

m, =a, +, 


where, as usual, the subscript periods in the a’s and b’s denote averaging. Thus 
Mi, — my — ma, + m,, =a; + br — (aj +b.) — (a, + &) +a, +b, 


so that (23.20) is satisfied and the residual quotient is an unbiassed estimator of the 
variance v. 
Under the same conditions it will be found that 


EHD (a, — 2, =(p—lot+ zz v. aah.) 
—(p—1)v ete Gy =a). . : ; . (23.23) 
Git SEA dira Cocoa +PZ Or =b) . . . a (23.24) 
BE (čp —2,)? = (p1 — Do + >) (a — a, + hy — b.) 
= (pq —1)0 + P (gy —a) FpZ(—b)  . (23.25) 


23.17. We have supposed that the component ¢ had a zero mean, but of course if 
all these components had the same mean, the constant common to them could be absorbed 
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into the functions a; and bj. Our hypothesis is thus a little more general than it appears. 
In certain practical cases it is a plausible hypothesis to make. For instance, in Example 
23.2 it is reasonable to suppose that the effect of a particular bobbin is the same for all 
ends, and the effect of situation the same for all bobbins. If there is any serious doubt 
on the point we have to collect further data and consider interactions in the manner 
described later (see 23.22). 

It may, however, be noted that if the variation of the m,,’s is comparatively small 
the appearance of the term containing them in (23.19) does not materially vitiate an estimate 
of v from the residual quotient. In any case that estimate will be greater than the unbiassed 
estimate, so that our inferences about significant differences of mean values will, properly 
interpreted, be on the safe side. 


23.18. Before going farther we may remark that the quantity we have called the 
residual sum of squares and the associated quotient are often referred to as “error” or 
“ interaction ” terms. The former is likely to cause misunderstanding and is better avoided 
altogether, for, as we have seen, it provides a measure of sampling variance, and there- 
fore of experimental error, only in particular cases. The word “interaction ” we shall 
define below ; it has been used in different senses by different writers, and when consulting 
original memoirs the reader should endeavour to ascertain the precise meaning which 
is being attached to it—if he can. In considering a given analysis it is as well to reflect 
on the precise nature of the items covered by such expressions as *' residual ", “ remainder ”, 
“error” and so forth. 


Three-way Classification 

23.19. Consider now the case when there are three classifications into A-, B- and 
C-classes. As before, we shall consider in the first place one member in each subclass 
A; B, Cy typified by tjm We now have 


Deu — 2)? = 2G, — Bd? HE Cp, 8.) EEG 2 
DX? 
+2 (tip, Zij — Be. $2.) +2 (tja — HL Bt do) 

42 (ty — 2%. —%a +...) 

+E (tj — Cpe, — Uhr — Wap. + Wu. +ga t...) + (23.26) 


the summations extending over all members of the sample, pgr in number, so that we may 


replace expressions such as JI (£j... — 2,..)* by x (a;,, — %,,,)%, eto. 
fkL 

On the usual hypothesis of normality and homogeneity we find that the first, three 
terms on the right of (23.26) are distributed as vy? with p — 1, q — 1 and r — 1 degrees 
of freedom. The second group is so distributed with (p — 1) (q — 1), (p — 1) (r — 1) and 
(q — 1) (r — 1) degrees of freedom. The last is distributed with (p — 1) (g — 1) (r — 1) 
degrees of freedom. All but the last of these results follow from the two-way case, and 
the last may be established as in 23.13 or by the consideration that for any fixed | the 
term has (p — 1) (qg — 1) degrees of freedom and that there are (r — 1) independent /'s. 
sis in the form shown in Table 23.5. (For the present 


We may then write the analy: 3 ; 
the expression “ interaction AB ? is to be regarded merely as a name given to a particular 


sum of squares. As before, the sums of squares and degrees of freedom are additive, 
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and the seven items into which the total sum of squares is analysed are distributed 
independently.) 


TABLE 23.5 
Form of Analysis of Variance for Three-way Classification with One Member in each Subclass. 
Sum of Squares. d.f. Quotient. 
Between A-classes . Z (ay,, — 2,..)* p-l 
Between B-classes . 2 (va. — 2...)? q—1 The quotient of 
Between O-classes . E(x. — 2...) r—l the sum of 
Interaction AB. . | Z(zjg — zj.. — Xek. + $...)* (p—1)(q — 1) squares by the 
Interaction BC. . Z (2k — Zk. — 2.1 + %,..)? (q — 1)(r — 1) corresponding 
Interaction Oat. a Z(z — tj.. — 2.1 +2...) (r — 1l) (p — 1) d.f. 
Residual. . . . |Z (zju — Zj.. — Z.k. — z..1 + Zjik.| (p — 1) (g — 1) (r — 1) 
Tomag bom — m.)* 
iF Toms . . E (aja — 2...) pgr—1 


23.20. If the hypothesis of homogeneity is rejected we may consider the alternative 
represented by 
Za = Ay +O, +o Og, : 7 o . (23.27) 
where ¢, as usual, is normal with zero mean. As in 23.16 it will be found that the residual 
term in Table 23.5 has expectation (p — 1) (g — 1) (r — 1) v, and hence continues to provide 
an unbiassed estimator of v. The quotients between classes are affected like those in 
equations (23.23) to (23.25); but the interaction terms also provide estimators of v with 
the appropriate degrees of freedom. For instance, 
(Eg, — Lj. — E, + 2...) =a, +o, te, +o, — (aj; +b, +e, +i.) 
= (a. T5, + c, Fe) @. EID eL EO Ie) 
mi. —tcb6. T (23.28) 
so that the expectation of the sum of squares of the a-terms is that of the t -terms, which 
we know to be (p — 1) (q — 1)v 


23.21. This brings up a new point arising for the first time in the three-way classi- 
fication. If (23.27) is true, the analysis of variance will provide four different estimators 
of the variance v, namely the interactions AB, BC and CA and the residual. These are 
independent (for they depend only on the ¢’s, and the theory appropriate to the case of 
homogeneity continues to apply) and their ratios may be tested in the z-distribution. If 
these ratios are such as can have arisen from random sampling we may accept the hypothesis 
represented by (23.27); if not we must reject it. In short, the interaction quotients pro- 
vide a test of the hypothesis (23.27). In the two-way classification no such test is available. 


Interactions 

23.22. On the hypothesis (23.27) the interaction quotients of type AB give unbiassed 
estimators of the variancev. If in any particular case these quotients differ significantly 
among themselves or from any other independent estimator of v, we have to reject the 
hypothesis. Apart from the normality of the variation of £, which is not for the moment 
in question, this means that we cannot represent the data as the sum of separate effects 
due to A-, B- and C-classes, together with a residual £ which is the same in form for all 
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subclasses. The effects of the classes are entangled—or, as we may say, they interact. 
This is the origin of the term “interaction ". 

Suppose, for instance, our data are crop-yields, and membership of the three classes 
corresponds to applications of three manures, nitrogen (4), potash (B) and phosphate (C). 
The hypothesis represented by (23.27) would then be equivalent to supposing that all three 
manures exerted an effect on yields, but that they did so independently. A given dressing 
of nitrogen would increase the yield by a;, whatever dressings of the other fertilisers were 
applied. But it might happen that the response in yield to a, varied according to how 
much of the others were present—potash might either stimulate the effect of nitrogen or 
inhibit it. If this were so, the fertilisers would interact and the hypothesis (23.27) would 
break down. Significant departures from homogeneity in the interaction terms usually 
lead us to search for possible entanglements of this kind. 


23.23. It must not be overlooked, however, that significant interactions do not 
necessarily imply interaction in any real sense. They may arise from heterogeneity in 
the data. 'To return to our example of crop-yields, suppose the yields were taken from 
a series of plots which differed materially in natural fertility. It might very well be found 
that the hypothesis (23.27) could not be justified even if the differences in yields due to 
the natural effect were partially absorbed into the coefficients a, b and c. If by chance 
the heavier dressings of fertilisers were applied to plots of greater fertility, the hypothesis 
might be shown as failing and “significant” interactions appear. Such points as this 
require careful consideration in the interpretation of significance, and we shall illustrate 
them in some examples below. 


23.24. Interactions of type AB, involving two classes, are said to be of the first 
order. When considering the general n-way classification we shall see that there can 
appear interactions of second, third, fourth . . . order. In fact, the residual in Table 23.5 
is formally equivalent to an interaction of the second order, of type ABC, just as the first- 
order interaction is equivalent to the residual in the two-way analysis of Table 23.2. 

To complete the definitions, we may define the sum of squares between A-classes as 
an interaction of order zero. The seven constituent items in Table 23.5 would then 


correspond to the following :— 


Interaction. d.f. 
A p-1 
Order zero : { B q—1 
e r—1 
AB (p —1)(q —1) 
Order 1 5 { BC (q — 1) (r — 1) 
| CA C= Dip) 
Order 2 . « + « = ABC (p — 1) (q — 1) (r — 1) 


This illustrates the general symmetry of the analysis and suggests obvious generalisa- 
tions. 
n-way Classifications 

23.25. For instance, with five classes A, B, C, D and E we may analyse the total 


i 5 : : 
sums of squares into 25 — 1 = 31 components. There will be ( 1) = 5 interactions of 


190 THE ANALYSIS OF VARIANCE 


2 3 
second order, type ABC ; a) = 5 interactions of third order, type ABCD; and one 


residual or interaction of fourth order, type ABCDE. The interactions of zero, first and 
second order are of a type already familiar :— 


^. 
>) = 10 interactions of first order, type AB; E) — 10 interactions of 


order zero; ( 


The third-order interactions are typified by 


E (yim, — jia.. — Crim. — jdm. — Tikem. T jke pass uum. 
"mu. + dem + um. — Te — Toke, — Vesta — Tam. +%,,,..)% + (23.30) 


and the reader will be able to write down the residual for himself. 

As usual, the 31 terms all furnish independent estimators of the variance on the 
hypothesis of homogeneity, and if this is rejected we may consider the alternative 
represented by 

yan = Uy + Dg + + dm + en + Damn + . . . (23.31) 


The complete analysis in such cases may become very complex, but frequently it is sufficient 
to consider only sums of squares suggested for investigation by prior expectations. 


Example 23.3 


The following data show the percentage water-content in a number of samples of 


a commercial product. Six samples were chosen ; each sample was tested by four different 
operators; and each operator carried out the determination by three different methods. 
We have thus a 6 x 4 x 3 classification. 


TABLE 23.6 


Percentage Water-Content of Six Samples determined by Four Operators using Three 
Methods. 


Operators. 
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We will first of all analyse the variance systematically with rather more arithmetical 
detail than is usually required, in order to illustrate the process. 
A great deal of work is saved if we take a mean at 60. The table then becomes— 


TABLE 23.7 


Operators, 


We have shown the totals of the tests for each operator, of the tests for all operators, and 
of samples for each test. 
We now form three two-way tables from this by adding the values of one of the 


variates, e.g.— 


TABLE 23.8 


Operators. 


Samples. 
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TABLE 23.9 


Samples. 


Operators. 
2 3 Torars. 
— 14 | —-1n 
—9 — 13 
Tests. mn Z9 
— 34 — 33 


As we have inserted the totals of various kinds in Table 23.7 these subsidiary tables 
can be picked out at once; but in general, totals are not available in the original (and for 
four-way classifications it is difficult to find a form of tabular presentation which will permit 
of their insertion) so that the tables have to be separately compiled. In practice I find it 
convenient to do so in any case to avoid picking out the wrong figures in the original table. 

Pursuing the condensation process, we should now derive three one-way tables from 
Tables 23.8 to 23.10, but in fact the row and column totals already give us what is required 
(and incidentally provide a check on the arithmetic). 

Now we proceed to find the various sums of squares. For the total of all observations 
we find — 115, and for the sum of squares of observations 653. Thus 


— 115 
z—— = — ]-597,22' 
e T 597,222 
Nai , = — 115x,,, = 183-680,556 


Z (tia — z...) = Z (tia)? — Na 
= 653 — 183-680,556 
= 469-319,444  , 5 4 . . (23.32) 


with 6 x 4 x 3 — 1 = 71 degrees of freedom. 
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For the interactions of order zero we require the sums of type 
2 (%,, — &,,,)? = E (a)? — Nat, 
where summation takes place over the N values. It is, however, unnecessary to work out 
the means vj. Consider, for example, the sum of squares between samples. From the 
totals of Table 22.8 or Table 22.9 we find (j denoting samples)— 
Z(12z,.): = (— 20)? + (— 20)? +.. . + 132 
= 5009, 
where the summation is over six values only. Thus, for summation over the 72 values— 
Z (2)? = m 5009 = 417-416,667. 


Hence 
Z (a;,, — %,,,)* = 417-416,667 — 183-680,556 
= 233-736,111 . - s 5 5 . (23.33) 
with 6—1=65 d.f. 
Similarly (& denoting operators) we find— 
Z (2k, — &,,,)? = "s — 183-680,556 


z5 16:152 118 oth «srl se TK (28264) 
with 3 d.f.; and (l denoting tests)— 
CE cL Np = 183-680,556 


= 9:444,444 reae ue TERM ORI) 
with two degrees of freedom. 
Now we require first-order interactions. We have (summation being over the V 
values)— 
E (tje, — %j,, — Xa, HB.) = E (eg, — 2...) H E (a, — m.) 
+2 (xy, —2.)* — 22 (zy, — 8...) (25, — 2...) 
—2 2 (tg, — 8...) (Xy, — 2...) 
=F (ty, —x—X.—2. -—Z(a —2.* (23.36) 
and thus the first-order interaction term is ascertainable from Z (2,.)? and quantities which 


have already been computed. 
From the body of Table 23.8 (remembering that summation relates to 72 values and 


hence that each value in the table is counted 3 times) we find 


Ze = a (lt + (— 5t oe jo 


= 499-666,667. 
The interaction term is then 
499-666,667 — 183-680,556 — 233-736,111 — 16.152,778 = 66-097,222 . (23.37) 


with (6 — 1) (& — 1) = 15 d.f. 
Similarly in the body of Table 23.9 we find for the sum of squares 1915. Hence the 
interaction of samples and tests is 
we 183-680,556 — 233-736,111 — 3-444,444 = 57-888,889. — . (23.38) 


A.S.—VOL. II. o 
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Tn the body of Table 23.10 the sum of squares is 1245. Hence the interaction of tests 
and operators is 


1245 


sq s 183-680,556 — 16-152,778 — 3:444,444 = 4-222,222. . (23.39) 


Finally, the residual is given by the difference of the total sum of squares and the 
interactions already found, namely by 
469-319,444 — 233-736,111 — 16-152,778 — 3:444,444 — 66-097,222 — 57-888,889 
— 4-222,222 = 87-777,778 . 
with (6 — 1) (4 — 1) (3 — 1) = 30 degrees of freedom. 
We can now make up the table of variance analysis as follows :— 


. (23.40) 


TABLE 23.11 
Analysis of Variance of Data of Table 23.7. 
Sum of Squares. d.f. | Quotient. 
| 

Between samples (S) 233-736 5 46-747 
ory operators (O) . 16-153 3 5:384 

» tests (T) 3:444. 2 1.722 
Interaction SO 66-097 15 4-406 
» oT 4:222 6 0-704 

» ST 57-889 10 5-789 
Residual 87:778 30 2:926 

Torats 469:319 71 
| 


We proceed to discuss the data in the light of this analysis. 

The most striking feature of the table is the size of the quotient between samples. 
46-747 
2:926. 
For v, = 5, v, = 30 the 0-1-per-cent. point is 0:8554, and the ratio is highly significant. 

We remark in passing on a point which will be taken up later. The ordinary z-test 
gives the probabilities that the ratio of two variances chosen at random does not exceed 
a given value. But in this case we have deliberately picked out the largest quotient for 
one of our estimates. Ifz had fallen at the 5-per-cent. level we could not have argued that 
the odds were 19 to 1 against the event. They are very much less, since we have deliber- 
ately chosen the largest value for comparison with the residual. However, in the present 
ease our probability is so small that we can confidently assume the significance of z (see 
23.27 below). 

Our first inference, then, is that the whole sample is not homogeneous. There appear 
to be variations from sample to sample which are not assignable to differences between 
tests or operators, and if we wished to standardise our product with greater accuracy we 
should be led to examine the manufacturing process. This conclusion is, however, subject 
to a point which we discuss in the next example. 

Having rejected the hypothesis of homogeneity we are now faced with the question 
whether the other quotients in Table 23.11 can be compared so as to assess the relative 


The variance ratio here is = 15-976, with a corresponding value of z equal to 1-38. 


i 
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variability of the other factors. We must then take a new hypothesis, and we will suppose 
that the variable may be written 
yg = a; + Ejio Sn Sanne ee panes '(23741) 


where a, is an unknown quantity expressing the accepted variation between samples. 
Unless there is something very peculiar about the tests or operators it is reasonable to 
suppose that the variation between samples can be isolated in this way. We will now 
suppose that the £'s, not the z's, are distributed normally with common mean and variance v. 
If the values given by (23.41) are substituted in the various constituent items of Table 
23.5, it will be found that except for the variation between samples all the other sums of 
squares assume the same form with £ written instead ofa. This, of course, follows from 
23.20 of which our present hypothesis is a particular case. On the hypothesis of (23.41) 
we are thus enabled to compare the quotients in the table in the usual way. The element 
of variation between samples has, so to speak, been abstracted from the discussion. 
We then turn to the sum of squares between operators in Table 23.11. The variance 


ratio is E = = 1-84. For r, = 3, », = 30 this is not significant. Similarly, for the sum 
Pino 22 Bee cele xs 
of squares between tests we find a ratio of 3.926 again not significant. Provisionally we 


conclude that there is no evidence of variation between operators and tests, apart from 


pure sampling effects. 
Now we have to consider the interactions. For that of SO we have the variance ratio 


coe 1-51, which is not significant. We find the same for the interaction ST. For 


2-926 
OT we have (taking the larger variance as the numerator) 
2-926 
z = log, 0108 = 0713, vı = 30, 7, = ô, 


This value is just beyond the 5 per cent. point and, judged by itself, might have been regarded 
as significant; but taken in conjunction with the others it may, perhaps, be accepted as 
a permissible sampling fluctuation. : 

To sum up, therefore, the only evidence of deviation from homogeneity appears in the 
sample-differences, and we see no reason to reject the hypothesis represented by (23.41). 
Since all the other items in the analysis, apart from that between samples, are homo- 
geneous, we could condense the table into the form— 


Sum of Squares. d.f. Quotient. 
Between samples . + + + 233-736 5 46-747 
Remainder . . + + + = 235-583 66 3:569 
ITOTATS- D. os =e ke 469-319 71 


The reader may wonder why, in carrying out the tests of significance, we have through- 
out used the residual quotient as the denominator of the variance ratio, and not, for instance, 
one of the interactions. There are two reasons. First, the residual has more degrees of 
freedom, so that it is preferable notwithstanding that the z-test is valid for any number 
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of degrees of freedom. Second, the residual is not so likely to be affected by interactions 
which, though not emerging into significance, might nevertheless exist. But once we have 
established that an interaction is not significant, there is no reason why it should not be 
amalgamated with the residual, as in the table on page 195. 


Example 23.4 

There is a point of great importance concerning the inference from analyses of variance, 
which we will illustrate by an imaginary example based on the data we have just con- 
sidered. Suppose our analysis of variance were of the following form :— 


Sum of Squares. d.f. Quotient, | 
SEE EI E! 
Between samples . . . . 125 5 25 
Between operators . . . . 60 3 | 20 | 
Interaction SO . . . . . 150 15 | 10 
Hemnainder SU eec ve 48 + 48 | 1 
RODATA Es, Toate re | 383 | 71 


We will suppose that the sums of squares between tests and the other first-order inter- 
actions are not significant, so that they can be amalgamated with the residual to give & 
remainder with 48 degrees of freedom as shown. 

On this evidence the sums of Squares between samples and between tests are both 
significant, as also is the interaction SO. What inference can be drawn about the varia- 
bility of the product from one sample to another? We know that the readings differ 
significantly; but may not this difference itself be due to the demonstrated’ variation 
between operators, or does it really exist? Is.there in fact any variability in the water- 
content of the product, apart from the sampling effect in homogeneous variation ? 

The significance of the SO interaction means that we cannot now regard the effects 
of operator and sample as independent. We must consider the possibility of entanglement. 
This is not the only explanation—there may be some other specific cause of variation 
present which we have not thought of, and on which our present data throw no light. But 
in this case there is some prior possibility that samples and. operators are “ entangled " or 
interacting in the ordinary sense. An operator may be getting better results from his 
material when it has high water-content than in the reverse case; or, knowing that the 
mean content is near 60 per cent. he may unconsciously (or even consciously) bring his 
determinations nearer to that figure and hence reduce their spread. 

In a case of this kind, and indeed in all statistical inquiries, it is important to have 
a clear idea of the question which is being asked and of the population to which it relates. 
We have had a number of samples and have tested them by four operators each using 
three tests. So far as we can see, the tests are equivalent but the operators are not. All 
the same, we are not very interested in the variation among operators (unless this is 
an experiment in psychology and not in chemistry. What we want to know is whether 
the water-content varies in reality, that is to Say as the average of a large number of 
determinations by different operators. Our particular four are themselves samples of 
a population of operators. 
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If we confine our attention to the four operators and suppose that each has a specific 
reaction to particular samples m;,, so that 
Xie = My, Eg, . : 3 : . (23.42) 
where € is a normal random residual with variance v for all j, k, then in the usual 
way we find 
E X (ay, —2x; — tg d 2,)* = (p —1)(g — 1) v + Z (mg — m, —ma --m,)* . (23.43) 
But suppose we consider the matter from a different viewpoint. Regard wj, as itself 
chosen at random from a normal population of operators with variance v'. Then, taking 
expectations of this population in addition, we find from (23.43) 
EX (x, — tj, — 2, +2.) = (p — 1) (4 — 1) w +o) .  . (2344) 


Thus the interaction term provides an unbiassed estimator of the variance v + v' of tjp. 
By “ unbiassed ” in this connection we mean that the average over all determinations and 
all operators will give the variance of x, in the population of all determinations and all 


operators. 
Similarly we shall have, on the same interpretation, 


EX (wy, —2) = (p — 1) (v + v’) 

BZ) =Q- Deto c + + O88) 
and hence the ratio of either interaction of zero order to the first-order interaction may be 
tested for homogeneity. Our analysis then becomes— 


z} 

Sum of Squares. arit | Quotient. | 

CRAP RETE : oz: ea SN a = ese 
| Between samples Slim tolerate 125 5 25 
| Between operators . . . . | 60 3 20 
Residual (SO). . . . + «| 150 15 | 10 


| TOTATS E CUR S a 335 23 


Neither ratio is now significant. For the sum of squares between samples we have 
a ratio of 2-5, v, = 5, », = 15, which is below the 5 per cent. point. 

Thus we should conclude that, regarding the data as a member of possible samples from 
all possible operators, there is little or no evidence of real variation from sample to sample. 
This is quite consistent with the inference we drew at the beginning of the example as to 
the "significance " of the terms concerned, though at first sight it appears directly 
contradictory. In the first case we inferred that for these four operators there were signifi- 
cant differences in their determinations for the samples, so that sample-differences are 
“ real ” in the sense that they cannot be attributed solely to random variation in homo- 
geneous material. In the second case we enlarge the domain by considering operators as 
subject to “ error ” in the sense that one human being differs from another, and find that 
sample-differences can now be ascribed to variation in the population of operators. 

No further emphasis is needed on the care necessary for the proper interpretation of 
the results of an analysis of variance. The nature of the population which is being con- 
sidered should be brought explicitly to mind in every case; and the reader should form 
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the habit of asking himself, whenever a result is found to be “significant”: significant 
of what ? 


Arithmetic of Variance Analysis 

23.26. Before considering further examples we will dispose of a few points arising 
from the calculation of the constituent sums of squares and the application of the z-test 
in determining the significance of variance-ratios. 

The calculation of sums of squares for an n-way classification can very conveniently 
be carried out by the use of a punched-card system when the data are numerous, and some 
remarkable computing feats have been performed by this technique. For ordinary labora- 
tory work with a machine, the process of Example 23.3 is possibly the best, though some 
modifications may be made to suit individual taste. 

The main work lies in computing the total sum of squares. This is done by finding 
the sum of squares of observations from the original data (with a convenient working 
mean) and the sum of observations obtained at the same time. The formula 

E(t — t.) = E tja — Ne? 
= E tia — 2, Z g . ; : . (23.46) 
then gives the total sum required. The quantity Nz?. is constantly needed and should 
be recorded. It is useful to preserve a few more decimal places than will ultimately be 
used in the final presentation of the analysis. 

The original data are then condensed into n (n — 1)-way tables by summing over 
each class in turn. In Example 23.3 this was done so as to give three tables: Operators- 
Samples, Tests-Samples and Operators-Tests. The main body of these tables gives means 


of the type x, multiplied by a constant factor. A further condensation will give Hn 


sets of means of type v; ; and so on, as far as is required. 

From the condensed tables we can then determine the sums of squares of means of 
various orders, and hence the interactions. The main pitfall lies in the way of the applica- 
tion of the correct multipliers and divisors—it has to be borne in mind that the summation 
takes place over all values of the sample. 

Suppose, for example, we have a four-way classification into classes with p, q, r and s 
numbers of members. The first condensation gives us four tables of which a typical one 
isp X q X r, based on the sum of s members. The next condensation gives us six two-way 
tables typified by p x q, based on the sum of rs members. The third gives us four one- 
way tables such as p, based on grs members. Consider the variance between p-classes :— 


NM EL D a Nri RI IE (99.47) 
In the condensed one-way table of p classes each term is to be counted grs times, and 
thus, if S, is the sum of squares in this table as it stands, 


S = M ns %;,,,)?. 


j=l 
Thus, summing over all members, we find 


S 
H = —— 
zu = qr ra)? 


ES: 
ire m rM eI: (22/48) 


whence (23.47) gives the zero-order interaction for p-classes. Similarly for g, r and s. 


0 


Lu 
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For the first-order interaction we have 


E (Ejea =U zo)? 
—X(m.—2-.)9—2Z(m.—2.)9—2Z(z,—2..)5. (23.49) 


The last two terms on the right have already been found. We require 
E (Eie. —%...,)° = E te. — Nat. . : . (23.50) 


Tf S” is the sum of squares of elements in the body of the two-way table found by adding 
r- and s-items, we find 
s 
Za. = Aaa: Alcs 2 a theta 280) 
and so on. The general process will now be clear. 

Unfortunately there is no convenient independent check on the calculations. The 
various condensed tables are self-checking since their totals are the sum of all observations, 
but the sums of squares do not check with anything. It is, of course, possible to evaluate 
each individual term in the residual and to check by summing squares, but this is too 
laborious for use except in the simplest cases. 


Use of the z-test for Several Variance-ratios 

23.27. In the complete analysis of n classes there are 2" — 1 elements, and the 
number of variance ratios arising for test may be considerable. The 2-test gives the proba- 
bility that a particular value chosen at random will be exceeded. If therefore we pick 
out the largest ratios for test, the chance that one of them is “ significant ” in the sense 
of exceeding the 100P-per-cent. point is a good deal greater than P, and we run into the 
danger of attributing significance to what may be a pure sampling effect. 

Suppose we make r different and independent tests of r values of z. The chance that 
each does not exceed a fixed value (depending on the number of degrees of freedom) is 
1 — P, where P is some assigned level of significance. Hence the chance that none of 
them exceeds its appropriate value is ; 

(1 — Py — 1 — rP, approximately, , 5 . (23.52) 


provided that P and rP are small. For instance, if P = 0:01 and r = 7 the probability 
that no z exceeds its appropriate significance value is 0:93, and thus there is a probability 
of 0:07 that at least one of them will do so. 

In practice the problem of numerous comparisons is more complicated because they 
are not independent. In such circumstances our judgment of significance has to incor- 
porate an element of the intuitive. However, if all the comparisons are based on the 
common residual quotient it is possible to find the probabilities that the largest of r values 
exceeds assigned values. The resulting expressions are complicated, even when all the 
sums of squares have the same degrees of freedom, but reference may be made to Hartley 
(1938) for approximations and to Cochran (1941) and Finney (1941a) for exact expressions. 
The conclusion reached by Finney is that if the degrees of freedom in the residual are 
sufficiently numerous the ratios may be treated as completely independent. 


23.28. There is a particular case of the n-way classification which is worth special 
mention, namely, that for which each classification is a simple dichotomy, so that there 
are 2" subgroups. This case arises frequently when so-called "factorial" experiments 
are being conducted to determine the effect of a treatment which is either applied or with- 
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held. The analysis of variance remains the same in principle, but of course the arithmetic 
becomes a good deal simpler. b 
| 


Example 23.5 (F. Yates, Supp. J.R.S.S., 1935, 2, 181) 
An area of ground was sown with peas and divided into 24 plots in the manner shown 
in Table 23.12. The plots received, or did not receive, dressings of nitrogen (N), phosphate 
(P) and potash (K) in the manner shown, the yields in pounds being given in the table. 
TABLE 23.12 
Yields of Peas and Manurial Treatments on 24 Plots 


49-5 


3H 


"There is some purpose here in the alternation of treatments, but that need not concern us 
for the present. We have 24 observations in four classes, viz. blocks (3), nitrogen (2), 
phosphate (2) and potash (2), giving 3 x 2 x 2 x 2 — 24 records. "n 
Condensing the table by adding blocks we get the following :— 
No treatment N E K NP NK PK  NPK Torau 
154.3 191:3 163.0 1560 1738 1640 1515 163-1 1317-0 


Condensing according to the three treatments we have— 


N 
E 336-9 
not-P 355:3 
TOTALS 692-2 
& 
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We omit the remaining calculations. The analysis in its final form is given in 
Table 23.13. 


TABLE 23.13 
Analysis of. Variance of the Data of Table 23.12 

Sums of Squares. df. Quotient. 

Between blocks (B). . + + 177-803 2 88-90 
Ae CNEL Rs rM e 189-282 1 189-28 

SY pO TT 8:402 1 8-40 

” Koo ye 118 NES 95-202 1 95-20 
Interaction BN . . -.+ - 94-255 2 47:13 
e BP M P RD 2.260 2 1:13 

» BK. ^ NE 23-685 2 11-84 

X Wp. T mE 21-981 1 21-28 

S NKI 5 rs tome 33-134 1 33:13 

m PIU. 0-481 1 0-48 

» BNP ceu cen de; 25-302 2 12:65 

3 BNE KERVEN F 36-004 2 18-00 

» BPK s 19f 53 6 3-782 2 1:89 

» NPR V IE aate arty 37-003 1 37-00 
Residual (BNPK) . + - - | 128-489 2 64-24 

TOTALIS o 2 = 505 876-365 23 


We have carried out the analysis in full so as to illustrate the arithmetical process 


for a four-way classification, but we may note at once that it is unduly elaborate. There 
are only 24 observations in the data and we cannot expect them to provide all the answers 
to the questions which we could frame as to the significance of the various constituent 
items in the analysis. This is borne out by the z-test. The residual variance is 64-24 
with two degrees of freedom. For », = l, x, = 2 the variance ratio at the 1-per-cent. 
point is 98-49 and that for », = 2, y, = 2 at the same point is 99:00. Only values greater 
than about 100 times 64-24 or less than 1 /100th of that value would thus be significant. 


Only the interaction PK falls outside this range, and even this, among so many, can hardly 
be regarded as significant. 
The inquiry is not, however, completely frustrated. Since the second-order inter- 
amate them with the residual to give a remainder 


actions are not significant, we amalg: 
sum of squares of 230-580 with nine d.f. and a quotient of 25.62. It will now be found 
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that among the first-order interactions only two are significant, PK and BP being too 
small Had they been too large we might have attributed some genuine significance to 
this result, but it is not very plausible to suppose that there is a “ real ” interaction between 
blocks and phosphate, or that phosphate and potash inhibit each other's action. The 
differences from expectation are more probably due to individual soil variation from plot 
to plot. ; 

If we accept the first-order interactions as not significant, we may amalgamate them 
with the remainder to give the following :— 


Sum of Squares. d.f. Quotient. | 
Blocker e PME ct sans 177-803 2 88-90 
Jen Dus cb om Soren wy TA 189-282 1 189-28 
TON IEEE EE 8:402 H 8:40 
Mec ire EN EN 95-202 1 95:20 
Remainder . . . . ... 405-676 18 22.54 
Ugo ue qe 870-365 23 


Here the P-quotient is not significant, but the variance ratio for blocks, 3-99, is near the 
5-per-cent. point. The N-quotient will be found to be significant at the 1-per-cent. point, 
the K-quotient near to the 5-per-cent. point. Our conclusion is that there is strong indica- 
tion that nitrogen influenced the yield, some indication that potash did so, and little indica- 
tion that phosphates did so; and that there is ground for suspecting heterogeneity in the 
soil partly because of the difference between blocks and partly from some of the first-order 
interactions. 

In this case, of course, we knew already more or less what was to be expected of these 
data and are the readier to accept the conclusions on that account. Had we known nothing 
of the effect of fertilisers on leguminous crops our conclusions on such slender evidence 
must have been very tentative indeed, particularly if we wished to extend them to peas 
grown on other soils under different climatic conditions with different amounts of fertiliser. 


Example 23.6 (C. E. Gould and W. M. Hampton, Supp. J.R.S.S., 1936, 3, 137) 


In the manufacture of optical glass there appear small bubbles known as “ seed ”, 
which constitute a defect. The glass is made in “ pots ” which take about a year to pre- 
pare, and are run continuously over long periods when once started. There are two pots 
to a furnace and materials are introduced into a pot from time to time which, after fusion, 
provide a “run” of glass. Each run provides several days’ work, one day’s work being 
known as a “ journey ". At each journey quantities of glass are drawn from the pot and 
blown into “ cylinders ”, there being about 18 or 20 to the journey. For the purposes of 
the experiment three cylinders were chosen, the third, tenth and sixteenth, and pieces of 
regular size cut from them for examination as to frequency of seed. The first five journeys 
of each of five runs were sampled. 

We have here a four-way classification 2 (pots) x 5 (runs per pot) x 5 (journeys per 
run per pot) x 3 (cylinders per journey per run per pot) The actual dates of the runs 
were February 16th, May 23rd, June 12th, September Ist and December 6th, so that the 
manufacturing period covered about ten months. We shall assume that the glass was 


i i 
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of the same type throughout, although in actual fact it was different in one or two cases 
—but not sufficiently different to affect the analysis. 

The topic of main interest here is whether the frequency of seed varies significantly 
according to the four factors concerned. If so, the alteration of manufacturing conditions 
may improve the wastage due to seed ; but if not—and the variation is the kind of thing 
which can be accounted for as chance fluctuation in sampling from a homogeneous popula- 
tion—there is little hope of improvement except perhaps by a radical alteration in the 
process affecting all pots, runs and journeys alike. 


TABLE 23.14 
Frequency of “ Seed” in Samples of Glass 


Pot 1. Pot 2. 
Cyl. 1. Cyl. 2. Cyl. 3. Cyl. 1. Cyl. 2. Cyl. 3. 
| 
2n : 41 56 100 52 61 88 
2 i4 55 89 93 49 62 97 
| Run 1 3 35 57 56 34 60 72 
4 78 67 113 47 93 118 
5 33 40 128 16 29 130 
Ja TET 66 36 65 80 40 
2 21 61 49 122 97 79 
Run 24 3 31 39 25 45 54 72 
| 4 43 72 52 * 109 120 80 
| 5 37 51 67 67 85 63 
J1 XXL Ic IN 76 139 130 
2 33 TEE] 49 | 46 58 63 
Run 34 3 24 39 24 15 33 39 
4 18 18 43 22 16 19 
5 28 42 28 27 19 22 
E DW 34 43 E de 66 24 
J 2 24 49 42 40 117 105 
Run 4 3 21 21 5l 30 28 34 
| 4 21 69 48 36 64 53 
5 76 48 49 | 39 60 78 
Siem 31 54 40 19 93 36 
9 | 34 24 46 16 12 2 
Run 5; 3 | 120 122 120 33 58 107 
4 109 119 120 25 63 90 
5 | aa? 49 60 | 34 43 30 


Before plunging into the analysis of variance it is as well to look over the data to see 
whether they themselves suggest any lines of inquiry. We observe considerable varia- 
bility from journey to journey within the same run, J3 and J4 of run 5 being conspicuous 
in pot 1; and in run 1 the numbers of seed appear to increase from cylinder 1 to cylinder 3 
in a rather exceptional way. The runs themselves seem to differ materially. Prior con- 


204 . ; THE ANALYSIS OF VARIANCE 


siderations also suggested an examination of the way in which frequency of seed varied 
between pots, since they were chosen so as to differ substantially in constitution. 
A complete analysis of variance of the data is as follows :— 


TABLE 23.15 
Analysis of Variance of the Data of Table 23.14. 


r- ie —- Um 
Sums of Squares. d.f. Quotient. 
Between pots (P) . . . . 898 1 898 
eC rona B) Cresc 14,059 4 3,515 
> journeys (J) . . . 4,355 4 1,089 
cylinders (0) . . . 10,631 2 5,815 
Interaction PR . . . . . 16,133 4 4,033 
» IPAE ea c s 4,081 4 1,020 
» BC ELE A 587 2 293 
» TOP bad. Ag t Ls 45,934 16 2,871 
5 BOTE cR S os 11,626 8 1,453 
» SOM RI E OE, . 2,540 8 317 
E Ie SNR oie EM 9,711 16 607 
” TOUT C yas, IRE 12,472 32 390 
» ORI ee ce 1,656 8 | 207 
» GPR isis Gay S 1,862 8 233 | 
Residual (PRJC) Te j= Bethe 8,110 32 253 | 
TOTAXS d Ie Em: 144,055 149 


The second-order interactions will be found non-significant, so we amalgamate with 
the residual, giving a sum of squares 33,811, d.f. 96, quotient 352. 

It then appears that of the first-order interactions PR, RJ and RC are significant and 
PJ may beso. There is beginning to appear evidence of heterogeneity, and that of a rather 
complicated kind. It seems that pots are interacting with runs, runs with journeys and 
runs with cylinders. 

Taking 352 as the quotient, we find that except for P the zero-order interactions are 
significant. The five R-means are 68-50, 62-67, 42-23, 47-77 and 59:27, so that the variation 
of runs is not a simple rise or fall, which could have been explained as a time-effect. The 
five J-means are 58:93, 55-37, 49-97, 64-83 and 51-33, again not a regular effect. The 
C-means are 44-46, 59-68 and 64-12, which are significantly different. Inspection of the 
table suggests that the first run is the source of the trouble. 

With data as heterogeneous as these it is rather difficult to sup a plausible hypothesis 
to test. The interactions of first order suggest that no simple additive effects of the four 
factors will explain observation, and if these terms are used as denominators in tests of 
variance ratios the variation between classes appears on the whole non-significant on the 
usual hypotheses. The analysis, then, suggests several subjects for inquiry as concerns 
the homogeneity of the data, but does not suggest any simple explanation of the observed 
figures. The reader may care to refer to the original paper for a more complete discussion 
of the subject. 


D NON-NORMAL DATA 205 


23.29. Perhaps we may pause at this point to review progress. We have seen 
that for an n-way classification of the special type wherein each subclass contains a single 
member, the sum of squares of all observations about their mean can be exhibited as the 
sum of a number of such sums. On the hypothesis of normality and homogeneity each 
constituent sum of squares, on division by its appropriate number of degrees of freedom, 
gives an estimator of the parent variance, and each is distributed as y? independently of 
the others. The hypothesis of homogeneity can then be tested in Fisher's z-distribution, 
subject to the adoption of a conservative attitude where many tests are made on the same 
data. Ifthe hypothesis is rejected we may replace it by a simple form in which the effects 
of the different classes are additive, provided that the interactions are not significant. 
The particular ratio chosen for a test depends on the hypothesis concerned, and it is import- 
ant to have a clear idea of the exact question to which an answer is sought. 


23.30. In the next chapter we shall consider the case when the numbers in different 
subclasses are not equal, discuss the additive hypothesis in more detail, examine the relation- 
ship of variahce- and regression-analysis, and extend our results to the analysis of covariance. 
We conclude this chapter by an examination of the important question: what can be 
done with the analysis of variance when the variation is not normal ? 


Non-normal Data 

23.31. The analysis of a sum of squares into its constituent sums can, of course, be 
undertaken in all circumstances, but the various quotients may not continue to provide 
unbiassed estimators of the parent variance if the population is not-normal. What is 
equally serious, the constituent sums of squares may not be distributed independently. 
Thus, when parent normality cannot be assumed, the quotients in the analysis table are 
no longer equal within sampling limits and their ratio is distributed in unknown form ; and 
even if the form were known it would probably depend on parent parameters and hence 
fail to provide an exact test of significance. 

The problem has been considered in four ways :— 

(a) Sampling experiments have been undertaken to see how far moderate deviation 
from normality affects the 2-distribution ; 

(b) Attempts have been made to find transformations of the variate to throw the 
parent distributions into forms with equal variances, at least approximately, 
before the analysis is applied ; 

(c) By introducing a randomising process into the data before they are collected, 
attempts have been made to preserve the 2-distribution as a close approximation 
— this amounts to a change in the nature of the inference, as we shall see below ; 

(d) Tests have been found which can be applied to ranked data irrespective of the 
parent form—this approach is a particular case of (c), but seems to merit special 
mention. 


We proceed to consider these four possibilities. 


23.32. The arithmetic entailed by a single analysis of variance, even in simple cases, 
implies that an extensive sampling inquiry into the distribution of z in non-normal popula- 
tions would be a very formidable undertaking. E. S. Pearson (19315) has studied in some 
detail the case of a one-way classification with unequal numbers, when the distribution 


* 
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of z becomes equivalent to that of the correlation ratio 4*. Six populations were chosen, 
characterised by the following values :— 


pb, =0, fı = 2:50 (symmetrical platykurtic) ; 

B;—90, B,=41 (symmetrical leptokurtic) ; 

fi-—9, ĝa = 705 (symmetrical leptokurtic) ; 

B, 02, B,— 33 (skew, Type III); 

B, = 0-49, f, = 3:72 (skew, Type III) ; 

f, = 0-99, f, = 3:83 (very skew, Type I, with abrupt start). 


The results suggested that for this range of f, and f, the distribution of z is adequately 
represented by Fisher's distribution, and that therefore the homogeneity test may be 
applied. The case when the variation changed from group to group was not considered. 
It was also concluded that “it seems probable that the more elaborate forms of analysis 
of variance are also of fairly wide application ”. 

Some work by Eden and Yates (1933) is often referred to as experimental confirmation 
of the same kind, but in fact it was carried out with rather a different object, that of con- 
firming the z-test for data under randomisation (see below, 23.36). 


Variate Transformations 


23.33. Suppose & is a new variate è (x). Then approximately we shall have 
var $ = (E) ver qu. 3 " 3 . (23.53) 


If now the parent variance of the x-distribution is related in some known manner to the 
mean, say f (m) =v, we have 


As a further approximation, if z varies about m by small quantities we have 
E. 4 
varė = ($) EN e a nO lo, SORIA) 


Now we wish ¢ to have a constant variance, say A, and if this is so, 


d fa 
zc rer 


or T NES MS IE ET (23°55) 


Although this expression is arrived at by approximation we are entitled to hope that 
the variate ¢ will have almost constant variance, and at any rate a more stable variance 
than g. 

For instance, if the original variation is thought to be of the Poisson type we have 
f(x) =, and from (23.55) are led to consider the transformation 


=| Yaw 


= vu, Seer eet aber ts, ig" ( 23.66) 


o>» 
| 
| 
7 Ba: 
| 
| 
d 
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if we choose 2 to be }. Similarly, if the variation is of the binomial type with variance 


p(1— p) we have 
UELLE 
: [ze a — 2») dp 


= gin! yz, : b : A . (23.57) 


on suitable choice of 4. 


23.34. These transformations are designed to “ stabilise” the variance. They do 
not necessarily bring the variate closer to normality, though in some cases they will do so 
—we have, for instance, seen that 7° tends to normality quicker than %? (12.7). The 
following values (Bartlett 19364) illustrate the way in which the square-root transformation 
stabilises the variance of a Poisson distribution :— 


Moan. m. Variance of Poisson | Variance of Poisson 
i Variate yz. Variate /(x + 3). 
0-0 0-000 0-000 
0:5 0-310 0:102 
10 0-402 0-160 
| 2:0 0-390 0:214 
| 3:0 0-340 0:232 
40 . 0:306 0:240 
6-0 0:276 0-245 
9:0 0:263 0:247 
12:0 0-259 0:248 
15-0 0-256 0-248 
4 


The term } in the third column was added by Bartlett on the analogy of a continuity 
correction. For m 3 the variance is evidently quite stable. 


23.35. Ifnow, having stabilised the variance, we carry out an analysis in the ordinary 
way, our residual sums of squares divided by the appropriate degrees of freedom will con- 
tinue to be unbiassed estimates of the common variance v, even if there are differences 
between the means of the classes. Instead of assuming as part of the hypothesis that the 
different classes are distributed with the same variance, we have transformed the variate 
so that this shall be so, at least to a close approximation. Relying further on the result 
that the transformed variates approximate to normality, or that if they do not the differ- 
ence will not seriously vitiate the z-test, we may apply that test to the transformed data 


in the usual way. 


Example 23.7 (Bartlett, 1936d) 


Table 23.16 shows the number 
four repetitions of an experiment with different tre: 


of wheat seeds out of 50 which failed to germinate in 
atments. 


E 
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TABLE 23.16 
* Germination of Wheat Seeds 
Number of Treatment. 
Number of 3 Th 
Experiment. E 


4 
In point of fact, treatment 7 was a repetition of treatment 6, the others being different. 
‘The point of interest is whether the treatments exert any effect on germination. We shall 
not inquire into any differences between experiments (which appear to be negligible from 
the row totals) and shall accordingly consider this as a one-way classification into seven 
classes, four numbers to the class. 
The presumption is that in any given class the variation is of the binomial type. We 
might apply the sin~!\/2 transformation, but will adopt instead an ad hoc square-root 
transformation obtained as follows :— 
We have 
v —np(-— p). 
Suppose now that p = p, + ó where ô is small. Then 
2 v =n (Pa +ô — py — pð) ») 


=n { (1 — 2p) (p — po) + po — vi} 
= np (1 — 2p.) + npj. 


» If we now put 


where k = i 


variance. 


np 
— 2p, 


E= Vie +k+4) 


and 2 is the observed frequency, then & will tend to have constant 


In our example the total frequency is 216 out of 1400 seeds, so that we may take as 


an estimate of 


Po the ratio 216/1400 = 0-15. The transformed variate then becomes 


= fap ci Oe} 


= V(np + 2), approximately. ] 


— ——— 
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On this basis the transformed variate-values are— 


TABLE 23.17 s 
Transformed Variates of Table 23.16 


Number of Treatment. 


Number of 
Experiment. 


The analysis of variance is— 


Sums of Squares. | d.f. | Quotient. 
Between treatments. . . . | 3:486 6 | 0-581 
Residual. . . . . + + «| 4:316 21 | 0:206 
| | 


TOUTS wen esti E | 7-802 27 


The sum of squares is particularly easy to obtain, being the sum of the original variatés 
plus twice the number of variate-values. 

The variance ratio, 2-8, is barely significant, being just beyond the 5-per-cent. point. 
There is little evidence that treatments are exerting any effect on germination, since a 
comparison of treatments 6 and 7 (which are the same) indicates that such “ significance ” 
as exists may be due to heterogeneity in the seed. 


Randomisation 

23.36. Consider a two-way classification of pq members, the observed value of the 
jth A-member of the kth B-class being æ. Following the line already considered in 21.48, 
we will consider the z-distribution in the population of values obtained by permuting the 
members in any A-class in all possible ways. There will thus be (g !)" possible values of 
z, all based on the observed values. We have already considered a case of this kind in 
dealing with the problem of m rankings (16.29) and we shall follow the same procedure 


in solving the more general problem. 
A.S.—VOL. II. 


PC 
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Let the ree be arrayed as 


bs Vu Vis Vio 
EZ V Voq . (23.58) 
aise tp no, Cof m Epa | 


If Sp is the sum of squares between rows, Sc that between columns and 5S the total, we 
know that in the ordinary case considered earlier in the chapter, S; is distributed as vy? 
with q — 1 d.f., and S — Sg — Sc as vy? with (p — 1) (g — 1) d.f. It follows that 
y Sc 9: 
ceo clum s n " + 4 3 . (23.59 
S — Sr ES d 
is distributed in the Type I form 
dF « Wia-n-i(1— yjt-D6e-Dn—ciqW. ., 5 . (23.60) 
It is easier to work with W than with z, but there is of course no difficulty in passing from 
„one to the other. 


“a We proceed to find the first four moments of W in the population of (q !)? values obtained 
y permuting the rows of (23.58) in all possible ways. 


23.37. If in (23.58) we increase the members of any row by a constant a, it is easily 
seen that S; and S — Sp remain unaffected, and hence so does W. Thus we may take 
the mean of each row to be zero and then Sp = 0. With this origin we have 
r 


Z(E xy) 
w— Se est) (23.61) 
i s al 
If now i) 


R = 2 (jj c) ` . A a . (23.62) 


-~ and the k-statistics of the g values xj, j — 1 .. . q, are written k; hig, ete., and 


CEA Co a A e eA EE 
ik 


we find 
f 1 2U 

wW I $ = DIS . (23.64) 
eec a oboe ccs e ERU ES ; : . — . (23.65) 
JE URS UL D) Eat kenn EE C eat... (23.60) 
E (Ry) = a big kira eae’ =e ee EUN IT (33.67) 

Tue a , (q—1)(g—2)(g — 3) 

Ri, Ter VES 7). ito! AY Meee), 
E( "EI kia kis + are ku Eg. . (23.68) 


E 


ds ee sa m 
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Then, for the moments of U, 


| f E(U) —0]1. ME De Rn NS ER * (23.69) 
iUius "MC MESE ILLI s 20 cR HR) 
| 
E _ (q — 1) H (q —2) 
E (U?) = 6 (q — 1) 2 Kia kj ka + 2 Va ba UE ee OTS 
B (Us ID A | —3) 
bof 2; B RENE IESUS 2: nde ! 


+3 (q — 1)? { (Z' kis ki)? — 2" KS kka} ; 
12(q— 1 —2 
T Se kis krs kia + 72 (q — 1) Z’ kin kya Ha kma » (23:72) 


where X' denotes summation over values for which the subscripts are unequal and permu- 
tations are not allowed. 
Finally, for the moments of W we have a` 


Pitman (1938) for further details. 


23.38. We now consider how far the first four moments of W, as found above, agree 
| with the first four moments of the distribution (23.60). The mean and variance of the 


"i * 
E (W) =; . (23.73) 
= 4 Z' kiz ky 
E(W-—W)?= Eu Ue Postmen ey ea ak) top Sh, Mud o 
| Y= p -10 Gh bee 
n 48 Z' kia kus ky 8(g —2) Z' his krs 
E(W —Wp-- i Mia wea M2 y 7 Md io Rn à ; . (23.75) 9x 
V = g- (Qh py (q— 1)? Gh ur 
48 (Z' kiz Teen)? 96 £ Kia Eja 
E(W—W a what se TT 
W- W= aq Qo PU- DUF Gh 
siens 1152 Z" kig kra kio kmz 
p*(q — 1) (Z kia)* 
16 (q — 2) (q — 3) Z kukr 192 (q — 2) 2 kis krs kis, . (23.16) 
| DFD 1 (Sket pta g (Z ka)" 
| A These formulae can be derived in the manner of 16.33, but reference may be made to 
| 
i 


latter are 
1 2(p — 1) 
- and ew 2 ^ 7 . (28.77 
p p° (pq — p + 2) : 
The means agree exactly. For the variances to agree we must have, from (23.74) and 
(23.77), 
A er Raley Ne II eee a ease 
p?(q—1) kh) plog- p + 2) 
EY E 22" kiz kpo 23.79 
; Writing TER . ; $ " . (23.79) 
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we find that (23.78) is.equivalent to 


p —1)@—)) 23 | 
K hp ee MU . (23.80) 


pr 


- 


1 345 : 
, the lower limit being approached when 


The ratio K may have any value from 0 to 


one of the second k-statisties k; is much larger than the others, the upper limit when they 
are all equal. Hence all that can be said about the variance of W is that it is not greater 
2(p — 1) 
than jq 1) 
Turning to the third and fourth moments, we note that in many cases where the varia- 
tion is not too skew the quantities k; and k,, will be negligible. A number of terms in 
(23.75) and (23.76) may thus be neglected, but even those that remain are fairly com- 
»plicated, and it is difficult to say how far the distribution of W will approach the Type I 
distribution (23.60). In practice the values may be worked out and compared. If there 
is reasonable agreement, the z-distribution of the variance ratio will hold in the particular 
population which we are considering. 


and that it takes this value when the variance of each p-class is the same. 


23.39. A better approach is to find the Type I distribution which has the same first 
two moments as W and to modify the z-test where necessary. It may be shown that when 
K is not too small the third and fourth moments of W and the fitted Type I distribution 
are in fairly good agreement, so that we may expect a good fit. 


1 
The Type I distribution with mean - and variance - a aR jj 5s the mean and variance 
of W by definition. Its third moment is easily E to "s 
8K? p—2 
. (23.81 
Soh gg es 
—1 


We have to see how far this differs from the Loa third moment of W given by (23.75). 
Now 
3 Z' kia kra k = E ki Z' kpo kig — X" kein kro 
= Z kig E kya kio — (Z ki Z kh — X kh 
= Z kio (3Z' kio kpo — Z ki) + Z kh, 


and hence 
6 Z” kia kro kiz _ po 2 i 
EX kia)? = 2 fee [PERS . " . (23.82) 
Since all the k’s here concerned are positive, 
E ka Z kh > (Z kh)? 
and hence 
Z kiz Zka |? 
Gu» Gu] TOON o o es 
Hence, from (23.82) and (23.83), 
Z' ki kro ki = 
or > Kk —2 420 — Kt =KH(1 15), . (23.84) 
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Similarly, since 


rR Sh |! á 
pe { P i) Gr eg DU S 


it appears that 


2" kiz kpo kiz 
(X ki)? 
On comparing (23.75) and (23.81), and assuming that the second term in the former may 
be neglected, we see that they differ by the factor whose limits we have found in (23.84) 

and (23.85), namely 2 


6 T uuo. one CNET) 


1—K 3--K 
1— —— 5 
x and T 
If K is not too small the limits are not very different from unity, and the third moments 
are accordingly in fairly good agreement. k 
In the same way but with rather more complicated algebra it may be shown that the 


fourth moments are in fair agreement. 
When all the rows are rankings, the case reduces to that considered in 16.33 et seq., 
and we have already seen that the distribution of W is closely approximated by the Type I 


distribution in that case. 


23.40. Suppose, now, that we have p classes of objects, one of each class belonging 
to a second series of classes, gin number. As our hypothesis we will suppose that member- 
ship of the q-classes is independent of the variate-values, so that we may suppose it to be 
a matter of chance how the values in any p-class are distributed among the q-classes. On 
this hypothesis the variance ratio will follow the z-form approximately (subject to the 
conditions we have discussed above) in the population consisting of the (g !)? permutations 
of observed values; and this will be so whether the parent is normal or not. 

By shaping the inference in this way, and making it conditional, we are thus able to 
apply the z-test even in cases of non-normality. The test of homogeneity still applies, but 
of course the inference is rather different from the usual type. This point has not, perhaps, 
been adequately emphasised in the past and there still seems to be confusion on the subject. 


` Randomised Blocks 


23.41. The principle of testing in a conditional population has received its chief 
applications in a certain type of agricultural experiment (and analogous cases in other 
fields), known as a randomised block experiment. We are given p blocks of land and wish 
to test the existence of differential effects among q treatments, e.g. manurial treatments, 
of a crop to be grown on it. We divide each block into q plots and grow the crops on each 
of the pg plots. In any one block we apply a different treatment to each of the q plots ; 
and we allocate the treatments among the plots at random. 

This randomisation is an essential part of the process. If the treatments exert no 
effect the observed yields might have occurred in any order, and by making the inference 
in the proper way we are able to test in the z-distribution without assuming parent nor- 
mality or the non-existence of fertility differences between plots of the same block. If, 
of course, the parent is near to normality the test is strengthened. Had we not allocated 
the treatments at random the use of the z-distribution would not have been valid in the 
absence of normality (at least approximate) on the part of the parent. 


a 
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23.42. It is of some importance to make clear the exact hypothesis which is being 
tested in this approach, since misunderstandings on the point have led to some rather 
heated controversy. If the treatments are numbered 1 to g, we consider the possible yield 
on the plot j, k if it received the Ith treatment, say £u. In actual fact only one of these 
treatments was carried out; the other values of z;,(; are hypothetical and are based on 
our conception of what would happen if the treatments were differently distributed. The 
totality of values aq form our hypothetical population. We are supposing that the 
observed yields can be expressed as 


peg = 0; + Ese m 


where a; is an effect differing from block to block but constant within blocks, and £;, » is the 
“individual ” plot effect which has a zero mean, The hypothesis we have considered in 
arriving at the validity of the z-test in conditional inferences is that every treatment affects 
every plot to the same extent, apart from the block effect a;. In short, we suppose that 
Ëx m is the same for all 7, This is the hypothesis usually tested in data from randomised 
blocks. 

Neyman (1935a) proposed an alternative hypothesis, viz. that the mean effects of 
treatments over all blocks were the same, on the ground that we are interested in average 
treatment effects when testing fertilisers, not the effect on particular plots. The hypothesis 
here is that æ., Į = x, which is not the same as before; and it appears from Neyman's 
analysis that the z-distribution under randomisation may not hold to such a satisfactory 
approximation as in the former case. Once again we have to stress the importance of 
gaining a clear idea of the hypothesis under test. 


Example 23.8 (Eden and Yates, 1933; Pitman, 1938) 


Eden and Yates considered some data, based on actual experience of heights of wheat 
shoots, comprising eight classes of four, equivalent to the following measurements :— 


Class 
1 2 3 4 5 6 7 8 
433 455 487] 407} 4523 2571 4341 4754 
429 419) 389 574} 4364 2033 526} 473} 
383 479 4634 4774 415 392 470 423) ° 
437 5041 469} 452) 418 426 532 481} 


The variances of the eight classes, in units of jth, are then found to be 
7628; 15,702; 22,669; 59,732; 3,666; 90,593; 26,297; 8672. 
The quantity K of equation (23.79) is then found to be 0:7577. The quantity 
—1)(q—1). : 3 : 

e e is 0-8077. Thus (23.80) is approximately satisfied and we expect that the 
z-distribution will be approximately reproduced by the data under random permutations. 

This was confirmed by Eden and Yates in a sampling experiment on the data. 1000 
sets of permutations were taken and z calculated for each. Agreement with expectation 
was good. 


Example 23.9 (Friedman, 1937) 


A good example of data from populations which are probably far from normal is given 
in Table 23.18, showing the standard deviations of expenditures on various items for six 


d 
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income-groups. The figures relate to families of wage-earners and lower salaried workers 
in Minneapolis and St. Paul, U.S.A., in 1935-6. 


TABLE 23.18 


Standard Deviations of Expenditure on Certain Items of Families in Specified Income Groups. 
(Figures in brackets are ranks.) 


Annual Family Income (dollars). 


Category of Expenditure. 
750- | 1000- 1250- 1500- 1750- 2000- 2250-2500 


Housing PAD 100:3 (5)| 68-4 (1) | 89-5 (3)| 77-9 (2)| 100-0 (4) | 108-2 (6) | 184-9 (7) 
Household operation . | 42-2(1)| 44-3 (3) | 60-9(4)| 73:9(6)| 43-9(2)| 61-7 (5)| 102-3 (7) 


Food 71-3 (1)| 81-9 (2) | 100:7 (7) | 86-5 (3) | 100:3 (5) | 90-7 (4) | 100-6 (6) 
Clothing 20. . | 976(1| 60-0(3) | 570(2)| 60-8(4)| 71:8 (5)| 83-0 (0)| 117-1 (7) 
Furnishings, ete. . . | 58-3 (2)| 52-7 (1) | 96:0 (6)| 60-4 (3)| 104-3 (7)| 89-8(5)| 858 (4) 
Transportation . . . | 46-3(1)| 82-2 (2) | 129-8 (3) | 181-0 (6) | 172-3 (5) | 164-8 (4) | 246-8 (7) 
Recreation . . . .| 190(1)| 23-1(2) | 387(3)| 458(4)| 59-0(7)| 50-7(5)| 55:2 (6) 
| Personal care 83(1)| 84(2)| 92(3)) 143(0)| 106(4)| 158(7)| 12:5 (5) 
Medical care. . . .| 204 (1)| 33-5 (2) | 60-1(4)| 69-3(5)| 114-3 (7) | 45:3 (3) | 101-6 (6) 
Education 2. .] 32()0| 41(2]| 127(4)| 18&9(5)) 89(3)| 41-5 (6)| 66-3 (7) 
Community welfare. . 4-1 (1)| 18-9 (5) 8-5 (2)| 12-9(3)| 253(7)| 19-9(6)| 16-8 (4) 
Vocation . . . . . | 77(0)| 11-2 (5) | 10-4 (2)| 10-9 (4)| 10:5(3)| 14:0 (6)| 144 (7) 
Gifts 5-3 (1)| 10-9 (2) | 11-2 (3)| 253(4)| 42-3(5)| 488(6)| 69-4 (7) 
Other . 6-0 (5)| 5-6 (4) | 22-2(7)| 25(2)| 62(0| L0(1| 40(3) 


In brackets we show the ranks of the figure for different income-groups for each 
category of expenditure. We wish to know whether the standard deviations for each 
category differ significantly for the different income levels. On the hypothesis that they 
do not it is a matter of chance how the ranks fall. 

The sums of ranks in each column are :— 

23, 36, 53, 57, 70, 70, 83. 
The coefficient of concordance (vol. I, p. 411) is then W = wae uy where m = 14, 
n=7 and S is the sum of squares of deviations of sums of ranks from the mean 


c eru = 56; we find that S = 2620 and W = 0:4774. We may test the significance 


(vol. I, p. 419) by writing ; 


- (m—1W _ 
z = log We = 1-24 
2 
n = (n — 1) —-— = 55 


v = (m — 1)», = 764. 
The value of z is highly significant, and we conclude that standard deviation is related to 
size of income—the more money there is to spend, the more variable is the expenditure 
on particular items. 
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NOTES AND REFERENCES 


The idea of comparing variance between classes with the variance within classes in 
order to test homogeneity is found as early as Lexis (see footnote on page 119). Modern 
developments, and particularly the exact test of significance for normal parents, are due 

s mainly to R. A. Fisher. Apart from papers by Irwin (1931 and 1934), connected accounts 
of the theory of variance analysis are hard to find, many points of theoretical interest being 
scattered among papers which are primarily practical. 

For the general theory and applications reference may be made to Fisher's Statistical 
Methods (1925a, 1944) and Design of Experiments (1935c, 1942), to a useful introductory 
account by Goulden (1939), and to the writings of Yates, particularly his Design and Analysis 
of Factorial Experiments (193715). 

On the question of randomisation in preserving the z-distribution see Eden and Yates 
(1933), Welch (1937, 19382), and Pitman (1938). References to work on ranking are given 
at the end of Chapter 16. 

For work on the distribution of the greatest of a set of variances see Fisher (1929a, 
1940a), Cochran (1941), Stevens (1939), Hartley (1938), and Finney (1941a). For further 
work on the square-root and sin~ transformations see Cochran (19405), Beall (1942) and 
Curtiss (1943). 

The literature of this subject is now very large. Some further references are given 
at the end of the next chapter. 


EXERCISES 


23.1. If x;(j=1...) are a set of normal independent variates with variances 
1/w,, consider the transformation 


uy = D Xj AU, 
j-1 
where the ls are defined by 


TES Ha bei.i.- 
= 1 ee ht ie E 
ly. Jr (EAE) &—19...j4—1 
i-i ^ 
aUe a) (X) den SERN 
Ig, = 0. gud qt 1 
k=j+1, en 


Show that the /’s are orthogonal and hence that 
DEBT 
K-l 
is distributed as y? with n degrees of freedom. Noting that u, = ot z,/ A/ Ew is dis- 


tributed normally with unit variance depended Of Us. . o Up mts that 


D wy (zp — 3)? 
k=1 


is distributed as 7? with n — 1 degrees of freedom. 


p 
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Hence derive the z-test for the analysis of variance with unequal members in a one-way 
classification. 
(Irwin, 1942.) 


23.2. Verify the arithmetic in the analysis of variance of Example 23.5. 
23.3. Verify the arithmetic in the analysis of variance of Example 23.0. 


23.4. In a bivariate table with k rows (different rows corresponding to different 
values of the w-variate) write 


1 A = 
Tate Se Y) 
1 
q aa (tte 8), 


where o? is the variance of the y variate, s? the variance, and n, the frequency in the row 
with variate-value z, Thus 


"n Nye q 
and the ratio on the right is the variance-ratio in a one-way classification with unequal 


numbers. 
Show that, for any form of population, 
E(h)=k—1 E(g-—N-—k 


varh = 2 (k — 1) + (f. -»i + "x 


varde 2(N —À) += 3) [zz LN — 2h 
cor (ig) = Ba 3) {b= 1g -zi). 
Hence, approximately, that 
h\ _ HE (h) varg _ cov (h, D) 
#() "zo Ep roro 
(y m ot , varh  4cov(hq) | 3 may fe 
q, E” (q) 


H(t) EMEA Ea) 
In the case when all rows contain the same frequency 


1 k2 
x(I)ty 


and then Joi 2 
sexe tes) 

AN. 2(k—1)(N — 3) 

wm (2) - a s 


Hence show that the mean and variance of the variance-ratio are, to this order, independent 
of the distribution of y, indicating that the z-test is not very sensitive to deviations from 
normality. 
(E. S. Pearson, 19315. It is rather remarkable that the correlation of h and q, far from 
disturbing the z-distribution, contributes to its stability.) 


È 


CHAPTER 24 
THE ANALYSIS OF VARIANCE—(2) 


Estimation of Class-differences 

24.1. In the previous chapter we considered the analysis of variance mainly as the 
provider of tests of homogeneity. We have now to examine in more detail the problem of 
estimating class-effects, assuming that the homogeneity tests have shown them to exist. 
We discuss in the first instance the case in which there is only one member in each sub- 
class, and for the sake of simplicity confine ourselves to a two-way classification, though 
the theory is quite general. 

The fundamental hypothesis to be examined is that the data may be expressed in 
the form 

qj = dj + by + Cg , A " : . (24.1) 

where a; and b, represent class-effects and Z is a random normal variate with zero mean. 
Our analysis of variance will have shown whether this is an acceptable hypothesis, and 
our present problem is to estimate the unknown values of a's and b’s from the observed a's. 


24.2. The joint probability of the Z's is i 
1 1 - 24 9 
dF o exp { E LI hy] Lene qi. (24.2) 
where v is the variance of ¢, and in conformity with the notation used in the previous chapter 
we have p A-classes and q B-classes. The maximum likelihood estimates of the a’s and 
b’s are then those which minimise the sum in curly brackets in (24.2), that is to say, the 
least-squares solution of the equations (24.1). In the usual way we find 


q 
D Cie = 4 — b) = 0, j=l,...p 
kar * eae) 


(ep a; — 8) =0, k=1,...q 
= 


which reduce to 
T —4;— 6 = 0 ‘ 
He a Ag ST Nes . . 5 e (24.4) 
Summing the first equation over j, dividing by p, and subtracting from the first, we obtain 
Xj, — 2, =a; —a, jq D ; ; + (24.5) 
and similarly 
Te — Tt, —b, — b, k=1,...q. - : . (24.6) 
In (24.5) there are p equations, but if we sum them all we reacli the identity 0 — 0, so that 
only p — 1 are independent. There is thus an element of indeterminacy which we may 
remove by supposing that a, = 0. Similarly we may take b, — 0, and then we have 
a d; = 2; — T., j=1,.-:p * . . (24.7) 
bk =T — T., 2c SCENES ` 7 . (24.8) 
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Our estimate of any class-effect is equal to the deviation of the mean in that class from 
the total mean. af 


24.3. Evidently similar equations arise in the general n-way classification. We shall 
see below that they break down for unequal numbers in subclasses, except in a special 
case when the numbers are proportionate. 

The assumption that a; and b; have zero means is not, in effect, a restriction on gener- 
ality but only a convention. If we prefer it, we may consider the slightly more general 
hypothesis that ¢ has a mean m, in which case we have to minimise 


2i — apt. I) e me eM (24) 


This will be found to lead back to equations (24.7) and (24.8), with the additional equation 
for estimating m 


m — €. 4 ^ A . (24.10) 
Or again, if we prefer to absorb m into the a-effects we have 
ay = d, 


the mean of a, in this case not vanishing. Which form we use is a matter of convenience. 


24.4. It is important to notice that the equations of estimation which we have just 
reached give each a, and b; independently of values in other classes. We obtain the same 
equation for a; whether we happen to be estimating other a’s and 6’s or not. This property, 
as we shall see shortly, fails to hold if the numbers in subclasses are disproportionate. 
The situation is similar to that in which we can determine the constants in a regression 
line independently of the others if orthogonal polynomials are used, in that each constant 
is given by a separate equation not containing any of the others. Data of this kind are 
called orthogonal. 

The direct comparison of class-means which is possible with orthogonal data can be 
seen, from general considerations, to be legitimate. In comparing z;, — 4, with æj, —%,, 
the estimates of the effects in the ith and jth A-classes, we are in each case averaging over 
q B-classes with one member in each. The B-classes, therefore, affect each mean to the 
same extent anddo not affect their difference. If there are more members in some sub- 
classes than in others, the means are unequally weighted with different B-effects and 


the comparison is invalidated. 


24.5. Regarding æj, — x., as the estimate of a; and x,, — v,, as the estimate of by, 
we see that the familiar equation 


E (zy —2,)* = Z (tj. — z,)? d E(za at.) +2 (Xe —8.— Ue +a)? (2412) 
f the sum of squares on the left, which has pg — 1 degrees 
here is one degree of freedom for every fitted constant and 
f freedom. Every constant fitted reduces the 


can be regarded as an analysis o 
of freedom, into terms in which t 


a residual with (p — 1) (q — 1) degrees o : 
number of degrees of freedom in the residual by unity. 
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Unequal Numbers in Subclasses 


24.6. For a one-way classification we have already considered (23.7 and 23.8) the 
case where the numbers in subclasses are unequal. It was seen that the total sum of squares 
could be expressed as a sum between classes and a residual which were independently 
distributed and whose ratio therefore provided a homogeneity test in the usual way. 

When we try to extend this result to two-way or generally to n-way classifications, 
we begin to run into difficulties. We can still find, as shown below, an estimator of » based 
on p — 1 degrees of freedom and differences between A-classes, and one with q — 1 d.f. 
based on differences between B-classes; but these are no longer independent, and conse- 
quently we cannot subtract their sum from the total sum of squares in order to obtain 
a residual or an interaction term which also provides an unbiassed estimator. 

On the other hand, there is now available an independent estimator of v which did 
not appear in the orthogonal case where only one member was included in each subclass, 
In fact, since there are several members in any given subclass, we can find an estimator 
of v based on those members alone ; and we may pool all such to form an estimator with 
N — pq degrees of freedom, where there are pq subclasses. This estimator will be inde- 
pendent of subclass means and any estimators based on them, and hence provides 
a "residual" such as we require to carry out homogeneity tests. 


24.7. Suppose we have a two-way classification into p A-classes and q B-classes, and 
let the number of members in the subclass A; B, be "jy Let £j, be the mean of these 
members. We may array the means as 


Zu Zi CC ASA EZ" 
oy LR Sie ie ts Eo (24.13) 
E p2 43378 po. 


Now we may, in the first instance, test for homogeneity by ignoring the differences 
between A- and B-classification and merely regarding the data as a one-way classification 
with pg classes. The usual test for homogeneity is then applicable. The sum of squares 
between means of classes will have pq — 1 degrees of freedom, the total N — 1 d.f., and 
the residual N — 1 — (pg — 1) = N — pq df. This residual, in fact, is the one men- 
tioned in the previous section, and is based on the pooled sums of squares within the pg 
classes. The other term based on pg — 1 degrees of freedom is the sum 


Eng (jy — 2,)* 


and is derivable from the array (24.13). 


24.8. To test the effect of A-classification separately we proceed as follows :— 
Any Z,, is the mean of n,, values and, on the usual hypothesis as to normality, will 


A v ? 
have variance PFA If x, is the mean of all N values we have 
jk 


1 y 
z, = PD zu. RM ES (24.14) 
ik 
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Let the marginal unweighted means in (24.13) be j., w so that 


AEN od18) 


On the hypothesis of homogeneity the variance of z; is given by 


(Etita) y ER a SEE) 


P\m © re Nja N; 
where 
1 1 H 
e PET H 3 $ is . (24.17 
N; vk (ca) qe 


Now let us regard the means z;, as the means in p classes whose numbers are N;, as 
is legitimate from (24.16). Then writing 


ZN;& 
= a . 5 è à . (24.18 
TN (24.18) 


we have for an unbiassed estimator of v 


i 1 
ONI ac ENE, -ezrn 200. (49) 
EX ; &. — e) 5-312008 25d ( 


p 


TThis estimator has p — 1 degrees of freedom and is distributed as °. (This follows from 
the one-way case except that N; may not be integral; and its general truth may be estab- 
lished as in Exercise 23.1.) It is independent of the residual with N — pq d.f., and hence 


the A-effects may be tested separately. 
Similarly, if 
B5 -An£(2) = ante) Ae UM pom 
j 


My ps Nt 
an unbiassed estimator of v is given by 
RET. [zan ay = az My}, j Ec PME OTE) 
q-—1l»x k 
where 
Z ME. 
k 
mE —— . (24. 
data (24.22) 
k 


and this also may be compared with the independent estimator based on N — pq d.f. 


Example 24.1 (data from Brandt (1933) considered by Yates (19342) ) 


for a number of breeds of pig, the numbers of each breed, 
and the total logarithm of the percentage bacon yielded by 
logarithm has been taken so as to normalise the variate. 


Table 24.1 shows, 
divided into male and female, 
the slaughtered carcases. The 
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TABLE 24.1 : 

4 4 : L 

Numbers and Logarithm of Percentage Bacon in Breeds of Pigs. 
- 4 
Female. Male. 
* Breed. n 
Log. Percent. - Log. Percent. 

Number. Basan Number. Bacon: 
Hampshire . . . 33 66-55 89 181-04 
Duroc Jersey . . 51 98-69 141 281:43 
Tamworth . . . 13 25-90 17 34-20 
"Yorkshire . —. ^. 4 7-62 9 17-58 
Berkshire . . . 8 14-64 4 8-20 
Poland China . . 15 28-11 32 64-42 
Chester White . . 35 66-90 47 90-52 
Opera ET: 12 23.32 23 46-70 

Torais . - 171 331-73 362 724-09 3 


The total sum of squares, which is not obtainable from this table as it stands, we quote 
as 13-0142. 
The class-means and reciprocals of class-frequencies are given in Table 24.2. 


TABLE 24.2 
Class-Means and Reciprocals of Class-Frequencies for the Data of Table 24.1. 


Female. Male. 
à: 5 Unweighted 
Breed. Mean of 
Mean. l/njy Mean. l/njy Mpana: 
3 

Hampshire . . . . 2-016,067 0-030,30 2-034,158 0-011,24 2:025,412 

Duroc Jersey . . . 1-935,099 0:019,61 1-995,958 0-007,09 1:965,528 

Tamworth . . . = 1-992,307 0-076,92 2-011,765 0-058,82 2-002,036 
Workshire’) {> =) ee 1-905,000 0-250,00 1:953,333 0111,11 1-929,167 1 

Berkshire. ss -. . 1-830,000 0-125,00 2-050,000 0-250,00 1:940,000 
Poland China . . . 1-874,000 0-066,67 2-013,125 0:031,25 1-943,562 4 
Chester White . . . 1-911,429 0-028,57 1-925,958 0-021,28 1-918,694 
ONES ase cs |: 1:943,333 0-083,33 2-030,434 0043,48 1:986,884 1 
i 
Unweighted Mean of (Total) (Total) 
Means . pens 1-925,979 0-680,40 2-001,841 0-534,27 1-963,910 1 
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i Taking first the classification into male and female (q = 8), we find, from the relations 


Te" E ou 
r Nj Pu ry 
64 
N= = 94 
+= 98049 — 940029 
64 
i Ros i HO: . 
| akon imei 


Then, from (24.18) 


o EN %, _ (94-0623 x 1-925,979) + (119-7896 x 2-001,841) 
DN; 94-0623 + 119-7896 


= 1:968,474. 
Thus our estimate of v, with one degree of freedom 


= 5 (N22) — e (EN) 
; AV = 0.3032. 


Similarly for the eight breed-classes we find an estimate of v with seven degrees of 
s 5 . 
g T 56 L 0.0865. 


Considering the 16 subclasses as a one-way classification, we find the following 
preliminary analysis (the arithmetical details of which we omit) :— 


freedom to be 


TABLE 24.3 
Analysis of Variance of Data in Table 24.1. 


Sum of Squares. d.f. Quotient. 
Between classes a Waar anton 12715 15 0:0848 
Residual S e UNS e rer 11:7427 517 0-0227 
TOTALS S E Do E. 13:0142 532 


The variance ratio here gives a value of z equal to 0-659, which is significant. Thus the 
data are not homogeneous. 

We now require to decide whether the departure from homogeneity is due to either 
breed or sex or to a combination of the two. For sex-differences we have found an estimate 
of v equal to 0:3032 with one d.f. Comparing this with the independent residual from 
Table 24.3 of 0:0227 with 517 d.f., we find that the effect of sex is significant. Similarly, 
for breed, the estimate of v is 0-0865 for 7 d.f., which again is significant. . We conclude 
that both breed and sex influence the departure from homogeneity. 
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Tt is particularly important to note that since the estimates between breeds and between 
sex are dependent, we cannot analyse the variance as follows :— 


TABLE 24.4 
Incorrect Form of Analysis of Variance of Data of Table 24.1. 


| 
Sum of Squares. d.f. Quotient, | 
Between sexes . . . + . 0-3032 1 0:3032 
Between breeds . . . . . 0:6056 7 0-0865 
“Interaction” . . . « . 0:3627 T 0-0518 
residual) os ey as 74 ) MP 11-7427 517 0-0227 | 
TOTAIS s o ge ee 13-0142 532 


In fact the term shown as “ interaction ”, calculated so as to make the sums of squares 
and degrees of freedom additive in the usual way, is not an unbiassed estimate of v. This 
is a critical point of difference between the orthogonal and the non-orthogonal case. 


24.9. Suppose that the homogeneity test has shown the existence of significant 
class-effects. As before, we turn to consider the hypothesis that the data can be expressed 
as the sum of A- and B-effects separately with a random normal residual. Let x be 
the typical member of the (j, k)th subclass, / varying from 1 to nj. Our hypothesis is then 

fy = Qj + be + on > : . . (24.23) 
where £ is normal with variance v. For convenience we will eet the mean of £ as absorbed 
in the coefficients a, so that we may take ¢ to have zero mean. 

The usual process of estimation of the a’s and 6’s leads to the minimisation of the 
sum over all N values of 

2 (ti — a; — b)? 
Differentiating with respect to a; and b,, we find the series of equations 
Bee T YE Onl om ce MW 


ZA enge Sx) 0 k=l. ag 


. (24.24) 


where ’ denotes summation over the n,, values in a subclass. These equations reduce to 


E Nir Ay + Engyb, = E Nijp jy 
x £ x : . (24.25) 
PLUR vis b = ria E. 
Writing N; for X nj, and N., for Ynj;,, we have 
k i 
Ny, a; + Z nin bi = 2 Me Rae j 1:59 . . (24.26) 
Z nat + Nb, = Z ne iw | eas RE A f . (24.27) 


To which we may add 
Züe. nt e cr (24-28) 


a 
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Had we chosen to absorb the mean of ¢ into the b’s, this last equation would be replaced 
by Xa; = 0. F 

When all the n’s are equal these equations reduce to the orthogonal case, and each 
a- or b-coefficient can be independently estimated. In the contrary case the equations 
have to be solved as they stand. 


Example 24.2 

Returning to the data of Table 24.1, we find for equations (24.26) and (24.27) the 
following, the values of the constants required being obtainable from the body or marginal 
sums of the table itself :— 


171a; + 33b, + 5lb, + 13b, + 4b, + 8b; + 15b, + 35b; + 12b, = 331-73 
362a, + 89b, + 1415, + 17b, + 9b, + 4b; + 32b, + 47b, + 23b; = 724-09 

33a, + 89a, + 122b, = 247-59 
5la, + 141a + 192b, = 380-12 
13a, + 17a, + 30b, = 60-10 
da, + 9, + 13b, = 25-20 
8a, + 4a, + 125; = 22-84 
lia, + 32a, + 47b, = 92-53 
35a, + 47d, + 82b, = 157-42 
12a, + 23a, + 35b; = 70-02 


To, which we may add a, + a, = 0. 
The solutions are 
— a, = a, = 0:026,507 ; 
b, = 2-017,259; ba = 1-967,367; b, = 1-999,799 ; b, = 1-928,267 ; 
b, = 1-912,169; b, = 1-959,136 ; b, = 1-915,877 ; b, = 1:992,241. 


These give us the “best” estimates of the mean effects of sex and breed on the 
hypothesis expressed by (24.23). 

The mean of the b’s is 1-961,514 which may be taken as an estimate of the mean of £, 
the b-effects then being the differences of the above b-values from this mean, 


24.10. Let us now consider the analysis of variance in the non-orthogonal case, 
when constants have been fitted by least squares in the above-mentioned way. 

To make the discussion clearer we will regard the estimation as relating to p constants 
aj, related by Z (aj) = 0, q constants br, related by X (b,) = 0, and the mean m. There 
are thus p + q — 1 independent constants which, in effect, provide estimates of the means 
of subclasses. Whatever these means really are, the residual quotient based on N — pq 
degrees of freedom gives an unbiassed estimator of v, the common variance. We have 
now to analyse the remaining sum of squares based on pq — 1 d.f. 

If the true (population) values of the constants are denoted by a, fj, and p, the sum 

Z (Zia — o4 — Be — n) 
is distributed as vy? with N degrees of freedom. Developing yet another variation on 
a familiar theme, we show that the corresponding quantity 


I (zp — a; — by — M)? = E (tja — &y — Be — Y: — Z (a; — og 
—Z(b, — br)? — Z(m — u)? . (24.29) 


is distributed as vy? with N — (p + q — i) d.f. 
A.S.—VOL. I. - 


Q 
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In fact, equations (24.26) and (24.27) show that the estimators a, b (and in our present 
case m also) are linear in the variables z. We can then find p + q — 1 orthogonal normal 
variables in terms of which they can be expressed. Their sum of squares will be distributed 
as vz? with p + g — 1 degrees of freedom (not some multiple of y? because the mean value 
must be p + q — 1 in virtue of 18.17). Thus the remaining term 2 (zj, — a; — b, — m)? 
is distributed as vy? with N — (p + q — 1) degrees of freedom, independently of the portion 
due to the constants a, b and m. 

Furthermore, the actual reduction in sums of squares, equivalent to the sum of the 
last three terms in (24.29), may be easily determined. Precisely as in the similar problem 
of evaluating residuals in a regression equation, we have 


E (tp. — a, — by — M)? = E a, —La, E ty —2b, E xg; ME t . (24.30 
(Eim — a; — by ) CBE ede Pen air b Mean PR ) 


where, of course, summation takes place over all values. 


24.11. The total sum of squares is already calculated about the estimated mean 
m, so that the reduction for the term X m? =N x, has already been taken into account. 
The total sum is then distributed as vy? with N — 1 d.f., as we already know. We know 
further that we can split off the independent residual sum based on N — pq degrees of 
freedom. This leaves us with a sum based on pq — 1 d.f. From the previous section it 
follows that we can analyse this sum into two parts: (a) the sum of squares due to fitting 
the constants a, and 6,, accounting for p + q — 2 d.f., and (b) the remainder based on 
py — 1 — (p +4 — 2) = (p — 1) (q — 1) d.f. This remainder is independent of the sum 
of squares due to fitting constants and provides an unbiassed estimator of v. If the ratio, 
as compared with the residual based on N — pq d.f., is significant, the hypothesis of additive 
effects breaks down. In short, we may regard this quantity as an interaction term. 


24.12. One important point to notice in this connection is that the interaction term 
depends on whether p + q — 2 or fewer constants are fitted. In the orthogonal case we 
can determine an interaction term once and for all, however things stand in regard to the 
estimation of inter-class effects ; but for non-orthogonal data the number of class-effects 
estimated affects the interaction term, and if necessary a new significance test has to be 
applied if further estimates are calculated. The situation is similar to the testing of 
regression coefficients when orthogonal polynomials are not employed. 


Example 24.3 


Returning again to the data discussed in Examples 24.1 and 24.2, let us regard the 
means in all 16 subclasses as simultaneously under estimate. For the reduction in sum 
of squares due to the constants we find, using the values of a and b found in Example 24.2, — 


0-026,507 (— 331-73 + 724-09) + (2-017,259 x 247-59) + (1-967,367 x 380-12)... 


(1055-82)? 
— eee = 1:04146. 
533 s 


Here, for instance, the sum Zaj is given by multiplying a, by the term Z z, already 
found. The last term removes the effect of including the mean among tho vs. 


» 
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The sum of squares between classes was found in Example 24.1 to be 1-2715, based 
on 15 d.f. We then have 


Sum of Squares. d.f. Quotient. 

Sex and breed (estimation of constants) 1:0415 8 0:1302 

Interaction C9 A E OR ER 0-2300 7 0:0329 
Between classes . . . . . . 1-2715 15 


Comparing the interaction term 0-0329 (7 d.f.) with the residual 0:0229 (517 d.f.) we see 
that it is not significant. 

If we neglect sex and consider breed alone, we have only to estimate eight constants 
b, . . . bs subject to Æ (b) =0. The sum of squares for breed alone is given by 

g 
122 

Similarly the sum of squares for sex alone will be found to be 0:4224. We have the 

following analysis :— 


1 1 
47-59)? + — 2)! +. . . — — (1055-82)? = 0:7253. 
(247-59) + 15; (380 12)? + Raa 055-82) 5 


TABLE 24.5 
Further Analysis of Variance of Data of Table 24.1. 


Sum of Squares. af. Quotient. 
Test for Sex 
Between breed (estimation of constants) 0-7253 7 — 
SAEC bie odes EA OR. OLOR 22 0:3162 1 0:3162 
Sex and breed...) 4.5 $6 ih e A 1:0415 8 — 
Test for Breed 
Between sex (estimation of constants) . 0:4224 1 — 
Breed CRT DEAN CES WE A5 0-6191 7 0-0884 
Sex and bred . . 5 : « - «4 «4 1:0415 8 — 
Intéraoblon ^. ao. Sey es qe MS 0:2300 7 0:0329 
Between classes . . . «+ + 1:2715 15 


Here, for instance, if we test for sex there are seven independent constants for breed 
and one for sex, the latter being the only one that interests us; and similarly for breed. 
On comparison with the residual 0:0227 both sex and breed are found to be significant. 


24.13. The reader may perhaps find the various tests of Examples 24.1 and 24.3 
confusing, and we accordingly summarise our results for the case of unequal numbers in 


subclasses. i 
In every case, except where each subclass contains not more than one member, an 


estimate of the common variance v may be obtained, with N — pq d.£, by pooling the 
sums of squares within the pq subclasses. Call this v,. 
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Homogeneity may then be tested (a) by considering the pg classes as a single one-way 
classification and comparing the quotient between means with v,, or (b) by calculating 
for either classification separately the estimates based on (24.19) and comparing them with v,. 

If homogeneity is rejected in favour of the additive effect of classes expressed by the 
usual hypothesis, the sum of squares between all classes based on pq — 1 d.f. may be split 
into independent sums related to the fitting of the constants and to an interaction term, 
The latter can be compared with v, to test for interaction. If this is not significant, alter- 
native tests for effects between A- and between B-classes may be derived by testing the 
sum of squares attributable to the fitting of the respective constants against v, ‘These 
tests are, in effect, tests of one class neglecting the effect of the other, and may not be 
accurate if the latter effect is not negligible. It is probably better to fit constants to both 
classes simultaneously in the first instance. 


Proportionate Frequencies 

24.14. We have previously spoken of non-orthogonal data as meaning any classi- 
fication with unequal frequencies in the subclasses, but there is one other case of unequal 
frequencies for which orthogonality exists, namely the one in which frequencies are pro- 
portionate, ie. there are marginal frequencies l, mj, such that 


Nj, = lj My. : n 5 A . (24.31) 
Here the means of A-classes are estimates of the individual corresponding a's (though it 
must not be overlooked that they are based on different numbers of members in margins), 


and the sum of squares between A-means may be computed in the usual manner appro- 
priate to a one-way classification with unequal numbers. Similarly for B. The interactions 
may be estimated by subtracting the A- and B-sums from the sum of squares between 
classes. We leave it to the reader to verify these statements. 


Special case of 2 x 2 . . . Classification 


24.15. The foregoing analysis can be extended to the n-way classification, but in 
the general case the solution of the equations becomes rather complex and the arithmetic 
a considerable nuisance. Where, however, the classifications are simple dichotomies the 
problem simplifies to a great extent. For instance, in equations (24.27), if there are only 
two values of a;, which we may take to be +a and — a, we have 


N: DETER Nin Eg — Tp + Noga. 


We have selected the a’s so that X ink = 0, which implies that the mean m is amalgamated 
with the b’s. Substituting for the 6’s in (24.26), we find 


mik — t 
afm, OPE a aad XAR — 2 ony Ee 
t k k k Nk 
which reduces to 
Nir Nia Nar Nee Ni n m. 
t FEM ae 117533 fe) Nir Nea er is 94.32 
a Ner Hna Nirn Ga EER Nai Hn. meem t ur 88 i 
Thus a is the weighted mean of the differences of corresponding B-class means and may 


be determined direct. So generally fora 2 x 2x2... classification. The differences 
may be tested for homogeneity by the z-test, which in this case reduces to the t-test. 


24.16. In view of the relative complexity of the non-orthogonal case, it is natural 
to wonder whether any serious error would be committed if we regarded the p x q table 
of array means as an ordinary two-way table with one member in each class and analysed 


TM se 
* 
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the variance accordingly, Evidently such a procedure sacrifices a lot of information about 
variation in subclasses, but that is not the point. Is the analysis valid ? 

The hypothesis on which the analysis is based is equality of variance in subclasses. 
If the numbers in subclasses are very unequal the means based on them will have very 
unequal variances, and we expect that the analysis may be misleading. If, however, the 
numbers are close to equality the analysis will probably be approximately correct. 


Example 24.4 
Reverting once again to the data considered in earlier examples, we have the following 
analysis for the variance of the 2 x 8 table of class-means :— 


Sum of Squares. d.f. Quotient. 
Between sex . . e + + - 0-3032 1 0:3032 
Between bred . . . . . 0:2635 7 0:0376 
Residual 22) .7—. e 04 0:2387 7 0:0341 
(TOPATS NS! LEE Nl 0:8054 15 


The sum of squares between sex is the same as before, as it must be for a dichotomy, 
but the effect of breed is seriously underestimated and would not be judged significant by 
comparison with the interaction term, which is our residual. "The numbers in the breed- 
classes are, in fact, too different to justify the approximation. 


The Missing Plot Technique 

24.17. The simplicity of the analysis of variance in the orthogonal case and the 
economy imported by keeping the number of values as low as possible often leads to the 
carrying out of experiments with only one member in each subclass. But this has a certain. 
practical danger in that the value in a subclass may be lost through circumstances beyond 
the experimenter’s control. For instance, an animal may die in the course of an experiment, 
or a crop on a particular plot may be ruined by pest; or sometimes a record may actually 
be lost after measurements have been carried out. In such cases we may estimate the 
missing values and perform a variance-analysis in the following way. 


24.18. Consider in the first place a p x q classification with certain missing values, 
r in number. We assume as usual that the variate-values are expressible in the form 


gj = Ay + by + og +m, - 5 E ; . (24.33) 
and we know that the “best” estimators of the constants are 
m=, | 
a, = tj, —t. p. 5 . . . . (24.34) 
b, — ma —&. 


The quantities on the right are, however, unknown to us because of the missing values. 
Suppose that we estimate the constants by minimising 

ZE'(xy—a;—bQ,—m)? . : 3 i . (24.35) 
where the summation X' takes place over known values. Our estimators are then deter- 
minate and may be written aj, bj and m’. 
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We will now estimate the missing value on the plot (j, k) by the equation 
Xp=H=Uth+m. . La : . (24.30) 
We have 
X (xg — ay — by, — m)? = Z' (xy — a; — b, — m)? + E (Ky, —a; — b, — m)?. (24.37) 
L9 


Let us now consider this as a function to be minimised, involving the unknowns a, b, m 
and r further unknowns X;,. The equations giving the latter will be obtained by differ- 
entiating (24.37) with respect to each X;, and in fact are typified by 


Xj =a; +b, +m’, : 
that is to say, by (24.36). The other constants are given by such equations as 
2” (aj, — a; — bp — m) +E (Xp — aj —b, —m') =0. . . (24.38) 
r 


The second term vanishes, and hence we obtain the same minimal values for a;, b, and 
m' as by minimising (24.35) by itself. Furthermore, the equations of estimation (24.38) 
may be written 

E (x, — a; —b, — m) = 0, s A ‘ . (24.89) 
where the summation takes place over all values, those of the observed z's where known 
and over the estimated X's where values are missing. 

It follows that if we write X,, for the r missing values, ascertain the residual sum of 
squares, which will be a function of observations and these r unknowns, and minimise 
it for variation in these unknowns, we shall obtain equations providing estimates of the 
unknowns equivalent to (24.36). The following example illustrates the method. 


Example 24.5 (Yates, 19330) 


"The following table shows the measurements of intensity of infection of certain potato 
tubers under eight manurial treatments in ten blocks. 


TABLE 24.6 
Intensity of Infection of Potato Tubers. 
Blocks 
Treat- 
GET 5 2 3 4 5 6 7 8 9 10 ToTALS. 
1 355| 239| b 2-00| 3:34| 3:83 | 3-86 | 3-50 | 2-23] 291| 27-51 +6 
2 230| 403| 2-54] 2:82| 329| 293 | f 2-55 | 2-90| 2-30| 24-96 +f 
3 3:96| 3-62] 3:46| 250| 2-94] 3-70] 3-82] 2-54] 318| 3-69| 33-41 
4 299| 3-99| 2-90| 397| 4-49] 4-70] 386| h 3-50] 3:59) 33-99 +h 
5 a 3-07| 3-49] 1-07] 3:99] 3-48] 3:80] 3-68 | 3-24] 2-70| 28-52 +a 
6 2336| 3-47] 264| 3-17] 326| 3-28] g i 3-07| 3-12| 2437 +g +t 
7 2-16] 234| 1-96] 260| 377| d 3:20 | 3-47 | 267| 3:33| 25-50 +d 
8 3-16| 2-52| 2:39| 3-68 c e 3-85 3:36 | 2-50| 413| 2559 +c +e 
Torats | 20-48 | 25-33 | 19-38 | 21-81 | 25:08 | 21-92 | 22-39 | 19-10 | 22-59 | 25-77 | 223-85 + a 
+a +b +e | +d+e| +f+g| +h+i +b+e+d+e 
+ft+gth+i 
at 
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There are nine missing values in this table, indicated by the letters a . . . i. Omitting 
purely numerical terms, which are irrelevant for the purposes of minimisation, we have 
for the total sum of squares, 


a? +b? +e? +... +i? — a (223-85 +a+b+e4...+4+7%)*; 
for the sum of squares between blocks, 


4 { (20-48 + a)? + (19:38 + b)? +. . . + (1910 +h + i)?} 

— gy (223-85 +a +b +e +... 83; 
and for that between treatments, 
ds { (27-51 + b)? + (2496 +f)? +... + (25:59 +0 +e)? } 

— gy (229-85 Far behe e) 
The residual sum of squares is the difference of the first and the sum of the second and 
third of these expressions. For minimisation we differentiate with respect to a, b, . . . 4 
in turn. On some arithmetic simplification we find s 


63a + b+ c+ d+ e+ f+ gt h+ i-20911 
a+63b+ c+ d+ e+ f+ g+ h+ i= 19003 
a+ b+63c+ d— Te+ f+ g+ A+ i= 231-67 
a+ b+ c+63d— 9e-- f+ g+ h+ i= 19935 
a+ b— "7c— 9d--63e-- f+ g+ h+ i= 20007 
a+ b+ c+ d+ e+63f— 99+ h+ i= 19973 
a+ b+ c+ d+ e— 9f4-683g-- h— Ti= 19501 
a+ b+ c+ d+ e+ f+ g+63h— 9i = 239-07 
a+ b+ c+ d+ e+ f— "g — 9h + 63i = 162-11 
This set of linear equations can, of course, be solved by routine methods, but also by iterative 
processes as follows :— 
The mean of existent values is 3-15. Assume this to be approximately the values of 


b,c ...4%. Then for a we have, from the first of the above equations— 
a = 4, {200-11 — (8 x 3-15) } = 2:92. 
Taking this value of a and 3-15 for c, d . . . i, we find for b from the second equation, 


b = a, (190-03 — (7 x 3-15) — 2-92} = 2:62. 
Similarly, from the third equation, 
c = 4 (231-07 + (2 x 3-15) — 2:92 — 2:62) = 3:69, 
andsoon. On reaching i we recalculate a from the first equation, using the approximations 
to the values of the other constants already obtained ; and so on until our values do not 
alter. In this ease only a second approximation is necessary, the values being— 


a | b c | d e | f | g h i 
First Aj E Uke 2:92 2-62 3-69 3-27 3-76 3:26 3:60 3:88 3-22 
Sond RM Oe Ml 2-88 2-58 3-73 3:33 3-76 3:32 3:601 3:89 3-22 


These are our estimates of missing yields. The treatment means are found to be :— 
m 2 3 4 5 6 7 8 
3-009 2-828 3:341 3-788 3:140 3:120 2-883 3:308 
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24.19. The question now arises how we may analyse the variance of data for which 
missing values have been estimated in this way. 

TThe original data provided a classification with unequal numbers in subclasses and 
can be analysed by the methods given earlier in the chapter; except that, since no sub- 
class contains more than one member, we cannot find a residual sum of squares within sub- 
classes based on N — pq df.. (N — pq, in fact, is a negative number.) For instance, 
"regarding the data as a one-way classification with pq — r classes, we shall have an analysis 


of this type :— 


Sums of squares d.f. 

Between classes *  . . »tq—2 (24.40) 
Residual . : : . (p-y(—-0)-r c pta 
Total . : i - pg—r—l 


The effect of the two classifications separately can be dealt with in the manner of 
Example 24.1. 


24.20. Two simplifications are possible. In the first place, since the minimisation 
of the residual is the same for the original data as for the data completed by estimates of 
missing values, we can use the latter to compute the residual precisely as for an orthogonal 
case, which simplifies the arithmetic. 

Secondly, it appears that to an adequate approximation we may substitute the esti- 
mated values for missing values and analyse the resulting material in the ordinary way 
as if it were orthogonal. If the proportion of missing values is high this approximation 
may perhaps break down, and in practice we should probably regard the experiment as 
ruined. More usually only a few records are missing, and the effect of replacing them by 
estimates is hardly likely to affect judgments of significance seriously. 


Example 24.6 


Continuing the analysis of the data of the previous example, we find, for the total sum 
of squares, 32-1012 with 70 d.f. The analysis of the completed data, that is to say the original 
. data plus the estimates of missing values, is as follows :— 


| Sum of Squares. d.f. Quotient. 
Between blocks . . . . . 9:7176 9 1-0797 
Between treatments. . . . 6-5812 JÉ 0:9402 
i«Regidual nans I e a E 17-6902 54 0:3276 
Tomm A Ppr 33-9890 70 


* Tt is assumed that no row or column in the two-way classification is entirely empty. If it were, 
we should have to ignore it and confine attention to the remaining arrays. x 


r 
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" Treating the original data as a case of unequal class numbers we find :— 
TU 
, ¥ Sum of Squares. d.f. | Quotient. 
j Na 
Between blocks and treatments 144110 16 0-9007 
Residual < 2 E s 17-6902 54 0:3276 
"POTALSU hu S 32.1012 70 
For blocks only :— 
Sum of Squares. d.f. Quotient. 
Between blocks . . . + + 8-5690 9 0-9521 
Remainder. . . . + + - 5:8420 7 0:8346 
Blocks and treatments . 14-4110 16 
For treatments only :— 
Sum of Squares. d.f. Quotient. 
Between treatments. . . - 6:2648 7 0:8950 
Remainder. . . . + e + 8:1462 9 0:9051 
Blocks and treatments . | 14-4110 16 
L - 


Whether we use the analysis of completed data or the more exact form, we see that 
differences between blocks and between treatments are significant as judged by the residual 
variance. The two analyses are, in fact, not very different, and even with as many as nine 
missing values out of 80 we should not err by substituting estimated values and treating 


the data as orthogonal. 


Relationship with Regression Analysis 

24.21. The general n-way classifications to which variance-analysis may be applied 
are not necessarily determined by a measurable variate. As for contingency tables, rows 
or columns can be interchanged without affecting the analysis. We can, however, regard 
a multivariate frequency table as an n-way classification and apply variance-analysis to 
it; and just as regression and correlation analysis provide a refinement on contingency 
analysis because of the arrangement of the classes in order by reference to a variate, so we 
may to some extent refine the analysis of variance in such a case. 


24.22. Consider in the first instance a p x q table of frequencies in the form of a 


correlation table. We will suppose the A-classification to be according to the variate a 
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and the B-classification according to y. Let us now consider the hypothesis that the data 
emanate from a normal bivariate population with zero correlation (or, somewhat more 
generally, that for any given y the z's are distributed normally with the same mean and 
variance). We can then regard the data as a one-way classification according to y with 
unequal frequencies and analyse the variance in the usual form :— 


Sum of Squares. d.f. Quotient. 
2 
Between classes . . . Sue —#)? pi] TU 
: -— 
N(1— n? 
Residual... . E (zy — j)? N -4 E E AES 
a=. 
TOTAS . . « N vara N-—1 


Here d; is the mean of n; z-values in the jth y-class, z is the mean of all N values, x; is the 
variate-value in the ith z-class and jth y-class, and there are q y-classes. The quotients 
are expressible in terms of the correlation ratio of x on y, viz. Ns (cf. 14.23. vol. I, p. 351). 
Now, on our hypothesis, the sums of squares between classes and the residual are 
independently distributed in the Type III form, and hence the variance ratio 


ae Need 
qg—11—m35* 


. (24.41) 


can be tested in Fisher's distribution with v», = g — 1, », = N — q. This is the test we 
gave in 14.25 (vol. I, p. 353) and it is reached by an argument of essentially the same 
kind. 


24.23. Now suppose that our p x q table is normal but correlated ; or, somewhat 
more generally, that the values in arrays of constant y are normally distributed with the 
same variance but with means which vary linearly with y, say 


m; =m + by. : ‘ ; t . (24.42) 
Then our data can be represented by the form 
ty =m + by +p . > 3 i . (2443) 


where the ¢’s are distributed normally with zero mean and the same variance v. Apart 
from the constant m, the only unknown here is the constant b. Our least-squares estimates 


(measuring from the means of z and y) now lead to the familiar form for the regression 
coefficient 


Zu 
[AR Nen T IN M s . (24.44 
Eg : 8 ( ) 
where summation takes place over all values observed. This is, of course, equivalent to 
Bv y) . (24.45) 
vary X J 7 : 


NA 
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Further, the reduction in sum of squares attributable to fitting the constant b is 


2 
Nb cov (x, y) = HEC VET) =N ri vaz, . 5 . (24.46) 
vary 
where r is the correlation coefficient of the sample. 
Our analysis of variance may then be written— 


TABLE 24.7 
Analysis of Variance of a Correlation. Table 
Sum of Squares. d.f. Quotient, 
Regression constant b . . . . . . . Nr? varz 1 Nr? var x 
2 p? 
Between classes (after regression is eliminated) | N (ņ? — r?) vara q-2 N = = 1 varz 
— 7? 
s PE orc re I cela cat comet ne N (1 — n?) varz N-—q NL vara 
V'OTATA SAG ergy he es sae Se N var x N-1 


This analysis gives us a test of the significance of the correlation coefficient in samples 
from an uncorrelated population and also of linearity of regression. 
Tn fact, if the parent correlation is zero, the parent value of b is zero and the quotient 
due to b is independent of the sum of the other items in the analysis. Thus the ratio 
Nr? var z "t 
>a Oo FE . . . (24. 
N(l—r?)varz 1—r* Cran 
is distributed in Fisher's form with », = 1, v, = N — 2. This is equivalent to saying that 
r? (N — 2) 
e — (24.48) 
is distributed in “ Student's " form with N — 2 d.f., which brings us back by a different 
route to the test given in 14.15 (vol. I, p. 342). 


24.24. Secondly, if we assume that the parent correlation is not zero but the regres- 
sion is linear, the sum of squares between classes after regression is eliminated is independent 
of the residual in Table 24.7, and hence the ratio 
N n? a; r? 

N vara 
ES e iA E . (24.49) 


See 2 ey? 
N vara NIS 
is distributed in Fisher's form with », =q — 2, v», = N — q. This test (due to Fisher 
himself) gives a test of linearity of regression in the normal case. 

It should be noticed that this test is only approximate if the classification is one of 
a normal population with broad groupings. If correlation exists, the distribution of a 
bivariate normal sample in an array of finite width is not exactly normal, being the sum 
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of a number of normal distributions with slightly different means. Unless the grouping 
is very coarse, this is not likely to invalidate tests of significance in practice. 


. 24.25. Consider now the general regression formula for p variates,— 
€,=6,%,+6,%,+-.-+b%. . ; z . (24.50) 


If we assume that the residuals z, Lt x; (say x) are distributed normally with 


" = 
constant variance, our least-squares milous of the regression coefficients are those given 
by the usual theory, and the fitting of (p — 1) constants reduces the sum of squares by 
N var x R*, where R is the multiple correlation coefficient (cf. 15.16, vol. I, p. 380). We 
then have the analysis— ` 


d.f. Quotient. 
2 4 d E. 
Between classes (regression constants) N var x R? p-1 STEM N var x 
i ek 
Residual cte "re" N varz(l— R?) N -p NEN N varz 
NS 
OPAL o. rry WE. N varg N-1 


If the regression is in fact linear of type (24.50), the residual quotient is independent of 

that due to fitting regression constants, and the hypothesis may be tested by means of 

the ratio j 
v a R? N—p 
p—l1—48H = 
which is distributed in Fisher's form with v, = p — 1, v, = N — p. This brings us to 
the distribution of R* given in 15.20. 


24.26. It is to be observed that in (24.50) we may choose the variates x. . . . ip 
as we please. In particular, we can take them to be polynomials of a single variate. Fron 
this point of view the analysis of variance links up with the theory of regression analysis, 
given in Chapter 22. If the polynomials are orthogonal we can fit the constants b one 
at a time, the fitting of any constant leaving unchanged the previous determination of those 
of lower orders. The reduction in sum of squares for each constant can be separately 
ascertained and corresponds to the loss of a further degree of freedom ; and at any stage 
wemay test the residual variance to see whether any particular term is worth while in the 
sense that it makes a significant contribution to the total variance. The exact test, of 
course, depends on the usual assumptions of normality. Ps 
: 24.27. The reader is now in a position to see a number of statistical topics which 

- on the surface appear to be distinct as parts of a single theory. Regression analysis, with 
its subsidiary. of correlation analysis, proceeds by the successive fitting of constants by 
least-squares. For the normal case this is equivalent to estimation by maximum likelihood. 
Partial and multiple regression; together with curvilinear regression, can all be subsumed 


Ea 
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under this central idea. The fitting of each constant splits off a separate contribution to 
the total variance which, under certain hypotheses, is independent of the others. ~ Variance- 
analysis proceeds in much the same way, but is more general in the sense that it can deal 
with the classification of values, however determined. Our various exact tests of signifi- 
cance of homogeneity in variance, of linearity of regression, of significance of correlations 
in uncorrelated material, of the difference of two means where variances are equal, of the 
correlation ratios, of the multiple correlation coefficient—all derive ultimately from Fisher’s 
distribution of the variance-ratio in the normal case. 


The Analysis of Covariance 


24.28. Suppose that we have a one-way classification, possibly with unequal numbers, 
and that in each class the members present values not of a single variate, such as we have 
considered up to now, but pairs of variate-values typified by Xij, y;;, J referring as usual 
to class and i to the number within the class. By the ordinary methods of variance-analysis 
we can discuss the effect of classification either on the z-variate or on the y-variate ; but 
there also arises for consideration the effect of class-membership on the covariation of 
x and y. This leads us to an extension of the analysis of variance to that of covariance. 


24.29. By an easy extension of the results for a single variate we have, analogously to 


2 (ty — 2,.)* = 2, Gu =H 5)? + 2j" (z4 — 2)? 
Lj ij j 


the equation in product terms 


Jeg- 2.) Wy — 9.) = Du — 29) Wy — 9.9) + 26,7 2)0,—9) (24.52) 
1j | D) j 
If we consider the whole sample as homogeneous the correlation between x and y is given by 


= (xy — @.,) (Yy —9..) 24.53 
+ ae ` E . A D3 
ViZG = 2.2 Wy 9.0) spe? 
We have also the correlation between means of classes 


cae mi) gH): 9454 
"TAE GS Y Za 1) A 
and may calculate a correlation of residuals within classes 
pee ig 3) (Ys = Yi) 
s V {E (uy — 2.4)? Z (ya — Y.5)*F 


r 


(a mv 


24.30. Ifthere is heterogeneity present, we should expect these correlations to differ ; 
and similarly for the three kinds of regression of y on v, such as y 


_ 2 (ey —2) Vy — 9. 
" D x S UTER ATE MERE ue (2B) 


The three correlations of (24.53)-(24.55) are, however, not additive, like sums of squares ; 
nor are the regressions corresponding. The covariances expressed by (24.52) are additive, 
but there is no simple test, such as exists for variance-ratios, to determine the significance 
of differences or ratios of covariances. Covariance analysis, however, is not primarily 


designed to test independence, but to examine whether there is any variation according 


^ 
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to class between the regressions of y on x within and between classes. Let us suppose 
that there is some linear relation of the form 


: DNO (Keehn LI- s, .(257) 
Following the notation of E. S. Pearson, we write 

Ciy = A Gu — 5) 
Cony = Z (yy — Y ELE |, (24.58) 
Ciy = Eley — &j) (yg — y.) 
Cus = z iy 
Oaza = 2 Cony . . . + (24.59) 
Ou = FO 
Ciim = n (z4 — z.) 
Com = tae liga No N : > . (24.60) 
Chom = Tu (zg — 2.) (y. — 9.) 


and C; C220, Ci for the corresponding total sums of squares and products. We may 
then exhibit the composition of the total sums of squares and products in the form of Table 
24.8. The arithmetic of the analysis follows that of ordinary variance-analysis. We 
shall give an example presently. 


TABLE 24.8 


Analysis of Variance and Covariance for One-Way Classification—Sums of Squares and 
Products and Regression Coefficients. 


» 


"Variation: af. Sum of Squares.| Sum of Squares. Sum of Products. Regression 

|. z-variate. y-variate. cy. Coefficients. 
Within jth Ta- Cis; 

ithin jth group| nj — 1 Cig C295 Cisj i-us w^ 

| ‘Aly 

gc groups .| N—p Oila Ois Gisa i= eum 
a 
Between groups pi Ciim Co2m Ci2m Dix Ci2m 
- | Ciim 

Torars .| N-—1 C110 C220 " Oiz bo C120 

| O10 


We now suppose that, apart from the regression effects represented by (24.57), the 
variation of x is normal with constant variance v. We can then compile various estimates 
of v from the residual variation after the effect of fitting regression constants has been 


DW 


E i 
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removed. For instance, within classes we have for the estimator of v, with N — 2p degrees 


of freedom, 
1 
decia Wy —94 —b (Rig — 20 
$i 


1 

NES 25 7 (£i — Ciy) 
1 

- N= S, say. 


The number of degrees of freedom follows from the fact that we have fitted a mean and 
a regression coefficient to each of p classes, making a reduction of 2p in all. We then obtain 
Table 24.9 :— 


TABLE 24.9 


Analysis of Covariance for One-Way Classification with Linear Regressions. 


Variation due to d.f. Sum of Squares. 
Deviations from linear regressions 
within classes ^. . . . . N — 2p 5: (yg — y. — bj (wig — 24) 


17 
- 2 (C22; — bj C127) -8, 


Differences among regressions. . pil De, (bj — ba)! (wig — 2.5)" 
ij 


- M C12) — ba C122 =, 
j g 
Deviations within classes from 
linear regression bg . . . .|N—p-1 2; (yj — Y.j — ba (wig — 84)» 
ij 
= O22a — ba C12a = 8, + Sy 
Deviations between classes from 
" linear regression bm . . . . p-—2? D nj (y.4 — Y.. — bm (2,4 — 2..))* 
i 
= C?2m — bn C12m = 83 
Differences between ba and bm . 1 25 { (ba — bm) (Zij — 2.3) 
ij 
+ (bm — ba) ug — ,,)}* 
11a 11m 
= (ba — bm)? Sia =s, 
Total deviation from linear regres- 
sion: baaa a a ASS N -2 Dy ty = y.. — bales — 2.) 
t= C229 — bo C120 =8,+5,+53+ s, 
ES 


* 
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^ The reader will probably find it useful to check the expressions in the third column of — | — 
Table 24.9 and to examine how the sum of squares of deviations from the regression line 
of the whole is analysed into the constituent items. 


24.31. Suppose now that we wish to test whether the relationship between x and y 
can be represented by the formula (24.57), and that there is no material class-effect present. — 
Then S, of Table 24.9 should be an unbiassed estimator of (N — 2p) v and should be inde- 
pendent of the residual estimator S, + S, + Sa which has 2p — 2 d.f. We may therefore 
test the hypothesis by the ratio 

Sy | 2p —2 
N—2p' S,-9, 4-84. 

If this variance ratio is insignificant we consider next whether the regressions differ in 
the'p classes. For this purpose we compare the estimator derived from S; with that based 
on $,; ie. the ratio 

Sa N — 2p 
Dl S. 
will be significant if differences are to be regarded as real. 

Tf this ratio is not significant, S, and S, may be pooled. Comparison of their sum 
with S, will afford a test whether the relation between group means is linear. ‘The ratio 
for this purpose is 

Eu ee pd 
$ N—p—-1' 8,’ 

Finally, even if this ratio is not significant, it does not follow that the common regression 
within groups is the same as the regression of the means of groups. To test this point 
we consider the ratio 


i E 
N—p—1l1'8, 


» = N — 2p, 9, = 2p — 2. . (24.61) 


*»7p—l »-N-—29p  .  . (2402) 


»—-N—p-—1, »—p—2. . (24.03) 


TM QE UTI.  . . (24.64) 


Example 24.7 

A number of recruits are given a preliminary test to ascertain their suitability for a 
certain course of training. At the end of the training course they undergo a proficiency 
test. The marks for three groups of recruits from three different towns are— 


G Preliminary: 45, 50, 56, 58, 59, 60, 62, 64, 65, 75 
roup l E Fee 
Proficiency: 46, 60, 52, 46, 48, 50, 55, 63, 58, 64 

Preliminary : 44, 49, 52, 52, 58, 59, 60, 62, 63, 63, 66, 69, 70, 72, 73 
Group 2 3 S n eno. 

Proficiency: 48, 55, 45, 60, 65, 64, 69, 71, 77, 70, 75, 80, 72, 75, 81 
Group 3 Preliminary : 47, 52, 59, 60, 63, 66, 68, 69, 74, 76 
Proficiency: 43, 56, 51, 72, 60, 61, 55, 74, 72, 80. 


We are interested here in the efficiency of the preliminary test as a predictor of the 

proficiency test. We therefore consider the regression of the marks obtained in the latter 

` (y) on those obtained in the former (z). We are, however, also very much interested in 

the question whether the regressions are the same, apart from purely sampling effects, 

in the three groups. Such a matter would naturally arise, for instance, if we were thinking 
* 


| Within first group 0) hel (Olas 


"* 
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of applying the same rejection standards in preliminary tests to all recruits, irrespective of 


their town of origin. 

Our scores are given to the nearest unit; and hence the variates are 
We will neglect this effect and assume that the scores are distributed 
normally. 


About origin y = y = 50 the sums of squares and cross-products are - 


discontinuous. 
approximately 


n. Z (x). Z (y). Z (z*). Z (y?). E (zy). 
Gropp Joker CES 10 94 42 1496 594 694 
Group 2a c ILS 15 162 257 2802 6101 3989 
Groupiss s "EDU SN 10 134 124 2556 2776 2422 


We can then calculate the quantities O. For instance, 


EE ie 942 = 612-4 


94 
Cin = 694 — 25 = 299:2 
Cua = Cii + Cui + Cus; ete. 


We find the following table in the form of Table 24.8 :— 
TABLE 24.10 


Analysis of Variance and Covariance for Data of Example 24.7—Sums of Squares and Products 


and Regressions 


Sum of Squares. Sum of Squares.| Sum of Products. 


Variation. d.f. oe y? ay. 


Regressions. 


= 06124 | Om, = 4176 | Oy, = 2992 
» second group 14 Cina = 1052-4 | Css = 1697-73 | Cis = 1213-4 
>» third group 9 | Gus = 760:4 | Osas = 12384 | Cins = 760-4 


b, = 0:4886 
b, = 1:1530 
b, = 1:0000 


Within groups. . 32 Ciia = 2425:2. | O22a = 3353-73 | Cia = 2273-0 
Between groups . 2 Ciis — 89:09 | O229m= 1005-01 | O1am= 118:57 


ba = 0:9372 
bm= 14270 


Torars .| 34 | Oyo = 2508:29 | Caso = 4358-74 | Cis = 2391:57 


ba = 0:9535 


A comparison of the three regressions within groups indicates some heterogeneity. 
Tt looks as if the preliminary test is not such a good predictor for the first group as for _ 
the others. We may proceed to test the reality of this effect by constructing Table 24.11 = - 


on the lines of Table 24.9. For instance, 


Sı = F (Coq; — Ciy b) = (417-6 — 299-2 x 0:4886) + (two similar 
j . 
— 1048-1. 
A..S—VOL. II. * 
: 
X 3 5t 
9 > > 


Tera r 


à - 
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We find— TABLE 24.11 
Analysis of Covariance of Data of Example 24.7—Linear Regressions. 


I 
Variation. d.f. Sums S. Quotient. 
Deviations from regressions bj . . . 29 S, = 1048-1 36-1 
Differences by . . . . . es 2 S,= 175-4 | 87-7 
Deviations from bg . . . . s . . 31 S, + S, = 1223-5 
Deviations of groups from bm . . . 1 S, = 835-6 
Difference between b; and bm  . . . 1 S,= .193 
4 — í e =. 
TOLAR pW Tater ee IO OY cyl ss 33 S, + Sa +S; + S, = 2078-4 


A comparison of the quotient 36-1 (29 d.f.) with the quotient of the remaining items, 
267-6 (4 d.f.) indicates that there are real differences between classes. A single regression 
equation will not represent all three class-relations. A comparison of the deviations from 
regressions, 36-1 (29 d.f.), with the differences of regressions among themselves, 87-7 
(2 d.f.), does not reject the hypothesis of equality of regressions within groups. We there- 
fore compare the deviations from 6,, 39-5 (31 d.f.), with the deviations of groups from bm» 
835:6 (1 d.f.). This is significant, suggesting that the hypothesis of linearity of regression 
of group-means should be rejected. 

The general result is to confirm our suspicion of heterogeneity. The correlation 
coefficients between z and y are— 7 

Within first group. . S c . 0:592 
» second group . ; 5 : . 0:908 
» third group . 3 A : . 0-784 


Within groups . i s : x . 0-797 
Between groups . : ; : E . 0-410 
"Total . . E ` 0:722 


Again the deviations between groups stand out as indicating heterogeneity. 


24.32. The analysis of covariance may be extended to the case where there is more 
than one independent variate. The regression coefficients are found in the usual way, 
and the sums of squares after regressions have been removed can be found and compared 
on the usual hypotheses. Suppose, for instance, there are two independent variates and 
a classification giving an analysis between classes and residual. We may represent the 
analysis thus :— 


Sum of Squares. Sum of Products. | 
d.f. 
zi xy y? 9, Ta yri yrs 
Between classes . . . n A B Cc 
Residual . 4 2) .| on’ 4 B e g 3. » 
TOAIAT V CN n* A’ B” a pt g Rt 
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Our regressions are then— 


b, b, 

Between classes . . . BQ — PR AR —PQ 

AB — P? AB — pi 
Residual . . . B'Q'— PR A'R — P'Q' 
A'B — pi IB ps 

| 

| BQ — PR APR" — P'O" 
T z = N 

US Š A"B" — P A'B" — p's 


The sums of squares € can then be reduced by eliminating regressions, i.e. by subtracting 
Qb, + Rba, giving 
g — BQ? — POR _ AR? — PQR 
AB — P? AB—P* 


. ABC — AR? — BQ? — CP? + 2PQR 


Apps «o. (24.65) 


This and the analogous quantities with primes give independent estimators of the. 
variance of the residual element, and a comparison to test homogeneity may be made in 
the usual way. 


24.33. In a case such as that of Example 24.7 it is evident that a comparison of 
y-means between groups is affected by what we know about the z-values. Ifweknow nothing 
about the latter, comparison of the y’s is a univariate problem and can be treated by the 
methods already discussed, the difference of means, for example, being tested by the use 
of standard errors or the t-test. But suppose that our 2^s themselves are found to be dif- 
ferent between groups and that there is significant correlation between x and y. Then 
it is possible that the relation, if any, between y’s in different groups is not, so to speak, 
an inherent quality of the variation of y, but is merely a reflection of their dependence on 
the z's, which happen to exhibit significant differences. In Example 24.7, differences in 
proficiency between groups may be due simply to differences of ability which were present 
before the training began and, if so, should be shown by differences between groups in the 
preliminary scores. We should not then be able to conclude from proficiency scores alone 
that training in one group had a more marked effect than in another. The differences 
were there before the training was applied. 


24.34. 1f, then, we require to consider the effects of training alone on the groups, 
we may “correct” the y-values by deducting the estimates 


Yy-y.-b,—z)y.- 0 o, . (24.66) 
or other more general regression equations. This, so to speak, allows for differences due 
to variations of the x-variate. 


b 
E 
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- Assuming that one linear regression equation adetustely describes the relationship 
en y and z, so that the corrected values are 


yg— Yy = Yi — Y.. — bo (y —2). + (24.67) 
we see that the difference of the corrected means of two classes y ; and y. is 
Yg — Va — bo (7,4 — T.y). > . 9 . (24.68)- 


This may be regarded as the sum of two parts which are independent. The estimated 
2 
variance of the first part, y; — y.,, is Ém where s? is the mean-square of the residual afters 


correcting for regression Sur the means of y; and y, are both based on q members. Simi- 


larly the variance of b is ES where A is the sum of squares of the z-variate entering into 


the residual row of the analysis. Regarding the z's as fixed from sample to sample, so 

that our inference is conditional, we see that the variance of the difference (24.68) is given by 
2 (x,—2c i 

al 2gp Se. EE: ox. (24.00 

8 E T E } (24.69) 

The ratio of the difference to the square root of this expression is distributed as “ Student's ” 


#, with degrees of freedom one fewer in number than those of the original residual. 


24.35. Similarly, if we have two independent variables x, and x, the corrected 
difference of y-means is 
Yg — Yw — {by (Tiz — Lir) + be (Sag —24)) . . (24,70) 
where temporarily we write xı; for the mean of the variate x, in the jth class, and so on. 
The variance of the part in curly brackets may be derived by considering the variance of 
the general expression 4b, + uba. From the equations for b, and b, we have 
bi B X (yz,) — P E (yx) 
ME Eon: . (24.71) 
p, = LP Z (a) + A E (yes) 
AB — P* 
where, as in 24.32, A and B are the sums of squares for z,, x2, and P is the cross-product. 
Thus the coefficient of any y in 2b, + ub, is 
QB = uP) x, + (uA — AP) v. 
AB — P? 
Since the y’s are independent the estimated variance of 2b, + nb, is 


2 
(AB — Pap (4 0B — uP)? + 2P GB — pP) (uA — 2P) + B (pA — Py) 


L4 B-—2)P + uA gt 
M AB — P? : = 
Thus for the estimated variance of the corrected difference (24.70) we have 


. (24.72) 


2, 43 B—92)uP --u* A 
2j 4 JE Hu 
E {i+ AB — Pp: } E 1 » 4 (24.73) 


where À = £y —%y, and p = toj — Tak: As usual, the difference divided by the square 
root of this quantity may be tested in the t-distribution. 
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24.36. Our account of the analysis of variance and covariance has not attempted 
to cover all the applications of the method in particular directions. We have concentrated 
so far as possible on the fundamental ideas and the broad lines of analysis to which they 
lead. Some further developments will be given in later chapters, but we must refer the 
reader who requires a complete. acquaintance with the subject to the references given at, 
the end of this chapter and the preceding. We will conclude our exposition with three 
final comments. ^ 

(a) Part of our hypothesis throughout has been that the residual element ¢ has constant 
variance from one subclass to another. In Chapter 26 we shall discuss methods of testing 
homogeneity in residual variance. For completeness we might perhaps have anticipated 
some of these tests in the present chapter, at least to the extent of exemplifying their use. 
We have not done so mainly for reasons of economy in space; but the omission of mention 
of the point in foregoing examples should not lead the reader to overlook (as many writers 
do overlook) the necessity for testing variance-homogeneity where possible, if it is required 
as part of the hypothesis. = 

(b In the majority of our examples we have proceeded at once to analyses of variance 
or covariance without dwelling on points which would require attention in any practical 
inquiry. For instance, since the primary function of many variance-analyses is to test 
the homogeneity of a set of class-means, the first stage would be to compute those means 
and examine whether they suggest any lack of homogeneity on intuitive grounds. Again, 
if heterogeneity is established, consideration of the means themselves, or of the primary 
data, will sometimes show how it arises. The student must never lose sight of his primary 
material. 

(c) Elaborating this point to some extent, we would emphasise that the analysis of 
variance, like other statistical techniques, is not a mill which will grind out results auto- 
matically without care or forethought on the part of the operator. It is a rather delicate 
instrument which can be called into play when precision is needed, but requires skill as 
well as enthusiasm to apply to the best advantage. The reader who roves among the 
literature of the subject will sometimes find elaborate analyses applied to data in order to 
prove something which was almost obvious from careful inspection right from the start ; 
or he will find results stated without qualification as “ significant ” without any attempt 
at critical appreciation. This is not the occasion to deliver a homily on the necessity for 
self-discipline in the use of advanced theoretical techniques, but the analysis of variance 
would provide quite.a good text for a discourse on that interesting subject. 


NOTES AND REFERENCES 


For the analysis of variance where subclass frequencies are unequal, see Brandt (1933) 
and an important paper by Yates (19342). Wilks (19380) has considered the subject from 
the theoretical viewpoint and exhibited the main results determinantally. For the missing 
plot technique see Allan and Wishart (1930) and Yates (19335) For the analysis of 
covariance see Fisher’s Statistical Methods, Bartlett (19342), an appendix by E. S. Pearson 
to a paper by Wilsdon (1934), Brady (1935), Wishart (1936), and Day and Fisher (1937). 
'The last-mentioned paper works through a practical example in some detail and will 
repay study. 

See also references to the previous chapter. 
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. EXERCISES 5 
24.1. For a two-way classification with one member in each subclass show that, 


— for normal variation, 
E (x, —2,.)(z4 —2,) = 0, 


AUI hence that the sums Z (xj, — z,)! and Z(r,, — z,)* are independent. Examine 
k 


how this breaks down for the non-orthogonal case. 
24.2. Verify the arithmetic of Example 24.6. 


E 24.8. Generalise formula (24.73) in the following way. If there are m independent 
variates, the variance of corrected differences is 


a sien] 


where 2, = z,, — x,,, and Cp = Er where A,, is the cofactor of a,, in the determinant 


| ars |, and ap = E a,x, summed over the sample. 
` (Wishart, 1936.) 


24.4. Derive by the analysis of variance the test of a regression coefficient given 
in 22.19. 


x 


CHAPTER 25 
THE DESIGN OF SAMPLING INQUIRIES 


Influence of Theory on Sampling Design 

25.1. The reader who is accustomed to handling the results of a sampling investigation 
as they appear in everyday statistical work may have wondered more than once in previous 
chapters whether theory was not reaching out too far in advance of practice. It is true 
that for certain types of experimental inquiry, notably in agricultural and biological research, 
the precision of exact statistical tests does not seem out of place ; but in economic or social 
statistics, for example, there is often so much error and imperfection in the raw data that 
the application of refined methods of analysis would be a waste of time. It is clearly 
useless, and may even be dangerous, to exercise an elaborate mathematical technique on 
data which are suspect from the very start of the inquiry. If our theory is to be really _ 
serviceable to the statistician and not merely an enticing mental exercise it must be capable 
of solving practical problems. 


25.2. Now it has to be admitted that much of the material with which statisticians 
have to work at the present day cannot be treated by the methods expounded in the fore- 
going pages when sampling questions are concerned. The commonest reason, but by no 
means the only one, is that the sampling process by which the data were obtained was 
biassed. In such cases the statistician has to lay aside the refined implements of his craft 
and do the best he can with his refractory material in the light of his own judgment and 
commonsense. A good deal of current statistical work is of this kind, and there is even 
a section of thought which is inclined to depreciate the advanced theory of the subject as 
“ academic ” in the sense that it is too remote from practical affairs to be worth studying. 
The misunderstanding is not likely to be removed by the counter-accusation sometimes 
launched by theoreticians that the theory is quite capable of being applied by anyone who 
has the ability to comprehend it. te 


25.3. Fortunately there is a growing realisation that the two points of view can 
often be reconciled by collecting the data in such a form that the theory can be applied to 
it. If only enough care is taken at the initial stages of an inquiry there is no need for the; 
appearance of imperfect data which defy exact analysis. Knowing beforehand what, 
theoretical instruments are at our disposal, and armed with a clear understanding of what 
questions we are trying to answer, we can frequently frame the investigation so as to maxi-: 
mise the information acquired with the minimum of effort. In short, the scope and nature 
of our theory itself dictates, to some extent, the form which the sampling inquiry should 
assume. In former times the statistician was usually asked to extract information from 
data which were collected by inexpert agents, frequently for quite different purposes. 
Nowadays he is still in the same position in some respects, but sometimes he is called in to 
advise on the design of the inquiry and can, within limits, determine the form in which the 
data are collected. He can make his theory applicable by selecting his sample in the 


proper way. . 


25.4, The general theory of the design of sampling inquiries has not progressed far 
enough for us to be able to give a systematic account of it in this chapter. In some fields, 


a 
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à particularly that of agricultural experimentation, it has reached quite an advanced degree 
of perfection; in others there remain many problems unsolved and possibly man y more 
which have not yet even been formulated. At the risk of some discontinuity of treat ment, 
therefore, we shall only give in this chapter a number of instances in which theoretical con- 
siderations exert a considerable effect on the scope of a sampling inquiry, in order to illus- 

"trate the field to be covered. There are, of course, many factors which ultimately deter- 
mine the form of an investigation, such as cost and expenditure of time, but they will 
not concern us here. For the present we shall be concerned solely with the extent to which 
theoretical considerations contribute to all the factors that have to be taken into account 


when an inquiry is designed. 
E. Preliminary Points 


25.5. There are certain preliminary points which, though obvious enough when stated 
explicitly, are often overlooked and cause a good deal of bad design. 


- The fundamental object of sampling is to obtain information about a population, 
and it is of the first importance to begin with a clear idea of what that population 
is. Imagine, for instance, that we are asked to ascertain whether pasteurised milk has 


a different feeding value from raw milk. In what population is this inquiry to be made: 

among children ? among the inhabitants of the British Isles ? among those who habitually 

drink milk or those who do not ? among townspeople or among country folk ? and so 

on. Again, suppose that we are given a new variety of barley and wish to know whether 

it has a heavier yield than a previously known type. Do we mean heavier in the usual 

barley-growing areas ? in every kind of climate or on the average over a series of different 

~ climatic conditions ? when subject to the same manurial treatments as those in current 
use ? and so on. 

In a similar way, it is necessary to have an equally clear idea of what we are trying 

` to find out about the population. In our example of raw and pasteurised milk, are we 

content to know that there is (or is not) a differential effect for children as a whole ? or do 

` we wish to ascertain whether any such effect varies at different ages, between sexes, or 


~“ according to nutritional standards? What exactly should we like to know ? It is no use 


returning the facile reply “ all about it ” to this query, for our information must be limited 
in virtue of the finite size of our sample. We must make up our minds what information 
we require and which questions have priority if it becomes necessary to sacrifice some of 
them for practical reasons. 

Thirdly, we should consider what we know already about our population. This 
point becomes of particular importance when our prior knowledge indicates heterogeneity, 
for then we may, in effect, have to divide the population into sub-groups and sample separ- 
ately from each. In our milk example, it is to be expected that children of different ages 
may react differently, or that children from lower-class schools may respond differently 
from those in middle-class schools. Or again, in our barley example, the two varieties 
may compare quite differently on Hertfordshire loam and on Lincolnshire chalk. It would 
be misleading to lump all the comparisons together when we have strong reason to suspect 
heterogeneity beforehand. In effect, prior knowledge of this kind frequently dictates the 
types of question we ask under (b), and the two are often different facets of the same problem. 
_ (dy As an extension of the same point, we may notice that prior knowledge about the 

j opulation sometimes indicates what sort of averages to use and what sort of tests of 
i significance it is proper to apply. Crop-yields, for instance, are known to be distributed 
in a form approaching the normal, so that arithmetic means are good estimates of parent 
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means and the tests based on normal theory may be applied. Accident statistics, on the 
other hand, are often distributed in a modified Poisson form ; income statistics in a J-shaped 
form, and so forth. 

(or specification of the population and a decision as to the precise object of the 
inquiry will usually determine certain parameters which it is required to estimate or certain , 
hypotheses for test. In general the problem is one of estimation, but not necessarily 80." | 
In our case of pasteurised and raw milk, for instance, we should probably wish to know’ 
the exact amount of the difference between the effects of the two (a matter of estimation), 
not merely whether a difference existed (a matter of significance). We then wish to know, 
before the inquiry begins, whether the estimates we shall have are going to be accurate 
enough for our purpose ; or alternatively, if the sample is of a given size, how accurate they 
will be. It may not always be possible to answer such a question completely beforehand, 
since the sampling variances will in general depend on quantities which have to be estimated 
when the data are available, but it is always useful to consider in a general way what sort 
of magnitudes would be shown as significant and what values would leave us still in reason- 
able doubt,-As a rule, matters such as this are closely related to sample size. 

(J) Finally, our estimates will be subject to experimental error and, in development 
of the last point, we have to try to find the form of experimental design which, while answer- 
ing our questions, does so with the minimum error, From a slightly different standpoint, | 
if we can determine the amount of error which is admissible, the problem is to find the) 
design which achieves no more than that error with the minimum expenditure of effort. 
Furthermore, we require to be able to estimate the extent of probable errors. In short, we 
require an efficient design, just as the engineer requires an efficient engine or the aircraft | 
designer an efficient form of airscrew, and for exactly the same reasons. 


f 


254. To sum up, our primary task in embarking on a sampling inquiry is to ascertain 
as accurately as possible what is the population under examination, and what is the informa- 
tion about it which we require. If, as usually is the case, that information concerns statis- 
tical characteristics such as means and variances, or more generally frequency-distributions, 
our second task is to design an inquiry which will provide estimates of these unknown ^ 
quantities and will, at the same time, provide estimates of their sampling error. It is not 
always possible, as we shall see later, to obtain full satisfaction in the reduction of error 
and the estimation of error simultaneously. Increased accuracy of estimation may mean 
loss of precision in our estimate of sampling error, so that although we are nearer the truth 
we do not know how near. There does not appear to be any single rule which will cover 
all the cases that can arise. We shall refer to a particular case of some interest in 25.39. 


wd Sampling 
025.7. We consider at the outset a case of fairly frequent occurrence in the sampling 
of existent populations. Suppose we are interested in the mean value of a variate x in 
some population J7; and that we know, or suspect, that the population is heterogeneous 
in the sense that we can delimit sub-populations M, Ia, . . . My in which the distributions 
according to z may differ. This type of case might, for example, arise if we were sampling 
the population of a town for income, there being districts, wards or even streets which are 
known to be inhabited by classes living at different income-levels. 

V Practical considerations alone may require that we draw a prescribed portion of the 
sample from each sub-population. For instance, with a town of 500,000 inhabitants it 
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‘would be most tedious to sample by using random numbers applied to the whole town. 
We should probably divide the work among districts and blocks and select random samples 
within the blocks. This, however, is not to be confused with the division of the town into 
relatively homogeneous districts because of its heterogeneity. Either process is called 
stratification. The problem we shall discuss is this: If we have decided to draw a total 
‘sample of » members, and can assign at will the number n; drawn from the ith stratum 
II, subject to the condition £ (nj) = n, how should we choose the numbers n; or need we 
choose them at all? Will our estimate of the mean value of x be better if we merely choose 
n members at random from JI, or can we improve it by controlling the numbers n; and not 
| [merely leaving them to chance ? 


25.8. Let x; be the jth member of the sample from the ith sub-population, and let 
* the latter contain a number N; of members with mean u; and variance oj. If u is the 
mean of JZ we shall have 


1 € 
way 2 Nemo SM. . i, 050 


tel 


We shall now seek for parameters 4;; such that our estimator of y, say t, is given by 
peg ey š 
T5353 (Aig £y) B . . < . (25.2) 
iml j=l 


that is to say, is a linear estimator in the observed variate-values. We shall seek for that 
estimator which is unbiassed and has minimum variance, i.e. for which 


DUOC S MR fort) T s (26.8) 
; E (t — E (t) }* = minimum. 3 À c . (25.4) 
Substituting from (25.2) and (25.1) in (25.3), we find 


1 
5| yzy} my Z Nim 
» N E 
and since E (v;) = p; this gives 


N 
á Zu(Zu-p)=% . BEES (25:5) 


For this to be generally true we must have 


EEN; = 
pa Am oo 0 s e (256) 
a first condition on the 2’s. If 4; is the mean of 2;; in the ith set we have 
Anh 2 
= us . (25.7) 


Now consider (25.4). 1 The variance of ¢ is the sum of k variances, for the samples from 
sub-populations are independent. Consider then the variance of X 2; £a, remembering 
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that the population of N; members is finite. We have 
variance = E X (45 (xy — uj) }* 
ZH of + pe {E 44 Ain, (iz — Ha) (Rie — Pi) h jzk 
Tk 


oi 
=  EXAoi— Ag Aib xy——3 
i 2 N;—1 
Nah Nae 


Mol IN ES 


= EE (n; (Ni — n) 43, + N; E (4 — 4!) E ; . (25.8) 


This is clearly minimised only if 


I 


ly — he =0, . 1 ` . . . (25.9) 
that is, if all the 42's for any sub-population are equal. This is what we Should expect on 
intuitive grounds, for there is no reason for weighting the sample members differently in 
the same sub-sample. : 

Our minimal variance, say v, is then given from (25.8), by summing over i, as 

a QN. m), 
Ui NATI ee Ai. 

2 gd - n) NI 

Na N—1 m 

2 3 
re wi Ia + constant. . D . +. (25.10) 
This is a minimum for variations in m subject to Ym, =n if 


p] 
UC meinem 


d 
where p is an undetermined constant. This yields almost at once 
o; Ni 
2 vant 


" 
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25.9. If we know the population variances o? and the numbers N, this equation 
determines the numbers n;; but in practice it is rather unlikely that we should know the 
variances without knowing the means, in which case we should not have to sample to find 
the mean of the whole population. Our result is not, however, useless. In the first place 


we find for the estimator ¢ 
EE Ay ty = >. D. 
N, 
ER E2512 
ENIM ( ) 
so that the estimate is a weighted average of the sample means, the weights being propor- 
tional to the population numbers N;, not to the numbers n; Secondly, without knowing 
the variances o? exactly, we may sometimes reach approximations from prior knowledge 
of the populations. Such values, without giving absolute accuracy, will at least represent 
improvements on selecting the n’s by chance. 


C^ 
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25.10. If the numbers N, are effectively infinite the formulae simplify, and, for 
instance, instead of (25.11) we have 
mco N, . 5 3 ? 3 . (25.13) 
the sample number varying with the standard deviation in the stratum concerned, as well 
as its number of members. 


25.11. If there is no information available at all about the variances o? the most 
reasonable course in applying (25.11) appears to be to suppose them all equal. In such 
2 case, for large N; we have 

non, . . $ « É . (25.14) 

or the sampling numbers are proportional to the population numbers. This is what we 

` might expect on intuitive grounds. If the populations are infinite the n,’s are equal, which 
again is in accordance with intuitive ideas. 


25.12. The above will serve as an illustration of the way in which theoretical require- 


ments can influence the scope of an inquiry conducted among an existent population. By_ 


seeking for an estimator with minimum variance we have been led to expressions deter- 

mining the allocation of sample numbers among the different strata—and incidentally, of 
“course, we have derived expressions for the minimum variance, so that the maximum 

possible precision can be ascertained. The fact that some of our results depend on unknown 
| constants suggests that in some circumstances it may be worth while conducting a pre- 
| liminary or “ pilot ” inquiry in order to estimate the unknowns and hence to improve the 
_ precision of the main inquiry which is to follow. ‘The possibilities of such pilot surveys 
* have yet to be explored, but the technique appears to merit serious investigation. 


25.13. In passing, we may mention one other topic of great practical importance on 
which theory can throw a good deal of light, that of optimum size of a sampling unit. In 
sampling a human population of a town, for instance, need we take individuals as our 
units ? , It would be easier to sample households, or streets, or even whole districts ; but 

| ¿do we lose anything by this method, and if so, how much ? Furthermore, the grouping of 

| individuals into units of larger size sometimes has a peculiar effect on correlations which 

|| may lead to erroneous conclusions, and a theoretical investigation may be required to safe- 
guard against error. We shall not pursue the subject further here—the sampling problem 
would require a book in itself—but the reader who is interested may like to consult some 
of the papers referred to at the end of the chapter. 


The Design of Experiments 


25.14. For an existent population the flexibility of sampling technique is somewhat 
limited. We are given an aggregate of values, some of which are to be extracted for scrutiny, 
and no manipulation of the sampling can tell us more than exists, so to speak, already 

_ inscribed upon the population itself. Consequently the main line of endeavour in such 
cases lies in estimating with the greatest accuracy (which is largely a matter of choosing 
the right statistics and minimising sampling variability), or in ensuring that sufficient 
material is available to enable the requisite comparisons to be made with significance 
(which is largely a matter of sample size and selecting the most suitable tests of significance). 
Nothing can alter the population, and theory will, as a rule, only react upon the sampling 
process by some such method as has already been exemplified, e.g. in dictating that the 


1 


n 
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sampling must be random, in stratifying the population before the sampling is carried out, 
and in deciding how limited resources can be expended to the best advantage. 


25.15. For hypothetical populations there are often wider possibilities, for the nature 
of the inquiry may itself determine which populations are to be studied, and the populations 
may, to a certain extent, be set up at will. For instance, if we are interested in an inquiry 
into the relationship between income and size of family in the United Kingdom, the popula- 
tion already exists and we cannot go outside it ; whereas if we wish to discuss the effect 
of a poison on bacterial growth or of a fertiliser on the yield of barley we can not only 
reproduce experimental data ad libitum but can arrange the inquiry so as to confine it to 
certain populations (e.g., by considering only a given type of bacterium in fixed nutritional 
circumstances or at fixed temperatures), or we may extend the domain of consideration as 
far as purely practical limitations will allow (e.g., by growing barley in new surroundings 
or in new climates). This is rather a pretentious way of saying that we may experiment 
in a domain which, within limits, can be assigned at will. The statistician has a much 
greater scope for ingenuity in the design of experiments than in the design of sampling 
inquiries on existent populations because of the greater degree of control over the population 
under examination. ‘ 


25.16. In the classical ideal experiment, only the factors under consideration were 
allowed to vary, other conditions being kept as constant as laboratory practice would allow 
—in investigations concerning the relation between resistance and current in an electric 
circuit, for instance, attempts would be made to keep factors such as temperature and 
external magnetic effects strictly constant. It would be recognized that there would be 
residual errors which would affect the exactitude of the results, but these would be measur- 


able on certain assumptions. 


25.17. Statistical theory can, of course, deal with such cases, but it can also go farther 
and often wishes to do so. In the first place, it frankly admits the existence not only of 
experimental error (in the sense of aberration from a “ true ” value) but of the much wider 
type of variation which gives rise to frequency-distributions in practice. Instead of isolating 
particular factors for study, it may wish to give full play to the disturbances which arise 
in practice in order to investigate what happens in “natural” conditions, For this reason, 
statistical experiments are often complex in the sense that a number of factors are allowed 
to vary simultaneously. 

Secondly, the admission of outside influences which together make up what is generally 
called experimental error implies that it should be possible to estimate the extent of such 
error from the data themselves. We wish to obtain, not the functional relations between 
variables which may only exist under artificial conditions, but the stochastic relations 


observed in practice. 


25.18. The effect of this on experimental design is that the hypothetical population 
we consider is often a rather general one. Taking the case of trials of a new variety of 
barley as an example, we should wish to compare its yields with those of other varieties 
in different soil conditions, with different manurial treatments, in different years (so as to 
get variations in climate), and so on. Furthermore, to obtain estimates of the error due 
to other factors we usually have to replicate the experiment. A great number of inter- 
comparisons fall to be made, and the process of design is essentially that of finding a form 


» » 
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of experiment which will permit all these comparisons and yet save as much unnecessary 
labour as possible. 


Orthogonality 


25.19. To reduce the discussion to more concrete terms we will consider the testing 
of a new variety of barley. In order to study its behaviour under different soil conditions 
we will select a number of areas in which barley is grown and choose a block of ground in 
each. This will give us inter-soil comparisons. We will also arrange to carry the experi- 
ment on for a period of years, so that climatic variations may also be compared. The 
other factor in which we are interested is the response to certain manures, which we will 
take to be dung (D), potash (K), nitrogen (N), and phosphates (P). 

Consider any block at any one place in any year. We will decide on certain standard 
quantities of the four manures and assume that for any manure either a dressing of this 
standard amount is to be given, or it is to be withheld. This simplifies the experiment, 
for then every manure either is or is not applied, and our results can be classified by simple 
dichotomies. Of course more complicated experiments can be devised to allow for different 
quantities of fertiliser, but the simpler case will be sufficient for our purposes. 

We have then set up a population which can be classified according to six qualities, 
place, time, and the application of four manures. Our results are intended to show whether 
there is any variation in yield between these conditions and various combinations of them. 
Of course, it does not follow in deductive logie that if there is significant variation from year 
to year in the particular years chosen there will always be temporal or climatic variation ; 
and similarly, if there is significant variation from place to place it does not follow that 
other soil conditions which have not been tested will show a significant variation. To 
arrive at such conclusions we have to perform an ordinary generalisation by induction. 
What we shall say, if significant results appear, is that in the regions tested, or for the years 
tested, there were significant variations, and that it therefore appears likely that soil and 
climate exert a material effect on yield —and we shall maintain this with more or less con- 
fidence according as our experience is wider or narrower. This is the familiar inductive 
inference which forms the basis of all scientific inquiry. 


25.20. Within any one block we shall wish to study the effect of manurial treatments 
not only separately but in combination. We therefore divide the block into sixteen com- 
partments and treat them, respectively, with no manure, D, K, N, P, DK, DN, DP, KN, 
KP,NP, KNP, DNP, DKP, DKN and DKNP. Here every possible combination appears 
once and only once. To compare, for instance, the mean yields in the presence or absence 
of dung we add all the eight yields for plots on which no dung was spread and compare it 
with the sum of the other eight. All the necessary comparisons can be made. 

Data of this kind are said to be orthogonal. Each possibility arises an equal number of 
times. The reason for the use of the word is that such material is orthogonal in the sense 
we have considered in the analysis of variance. We saw in Chapters 23 and 24 that where 
cell-frequencies were equal the analysis was greatly simplified, and that under the custom- 
ary hypotheses the estimates of means were independent. It is not, of course, absolutely 
necessary to have orthogonal data—in fact, we have shown in Chapter 24 how to deal with 
the non-orthogonal case; but it is evidently a great convenience to be able to arrange 
for orthogonality, and no efficiency is lost by doing so. 
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Replication 

25.21. If, as suggested above, we divide each block into 16 plots and treat each differ- 
ently, the analysis of variance of any block will have 15 degrees of freedom ; and if we 
cannot ignore any of the interactions there will be no residual variance due to “ error a 
that is to say we cannot estimate the reliability of our comparisons, All the 15 possible 
independent comparisons may be made, but we cannot decide whether differences are 
significant in the sense that they may be due to the other factors which we have agreed 
to allow to bear on the experiment, such as individual soil differences from plot to plot, 
If we are to estimate such “ error ” we must give the factors which produce it an oppor- 
tunity of varying. This may be done by replicating the experiment, that is to say, by 
repeating it in the same form. For instance, suppose that we set up four blocks and divide 
each into 16 plots, applying our manurial treatments to each block. Then, assuming that 
there are no significant interactions between blocks and treatments (a matter which we 
can test by examining the interaction terms in the variance-analysis), we shall have 63 
degrees of freedom, of which 15 are assignable to treatments and their interactions and the 
remaining 48 to a “ residual " term, the latter providing an estimate of experimental error, 
We have exemplified this process in Chapter 23. 


Randomisation 

25.22. Up to this point we have said nothing about the arrangement of our 16 plots 
within the block. Suppose we divide our block into plots of equal size. Is there any 
advantage in allocating the treatments systematically, or is it preferable to assign them- 
at random ? 

We shall consider the relative merits of random and systematic arrangements in more 
detail below, but we can announce the general rule now : unless there is some good reason 
to the contrary, it is better to allot the treatments at random. Where possible, chance 


should be given full play. 


25.23. The justification for this rule in our present instance can be seen by reference 
to the section on randomised blocks in 23.41. We saw there that by randomising the 
allocation of plots we were able to preserve the z-distribution and hence to validate our 
tests of significance, even where normality in the parent form was not assumed. "The 
process is essentially one of extending our hypothetieal population. Instead of considering 
the observed yields as specimens of what might happen in repeated trials of the same variety 
of barley if the same manurial treatments were applied to the same plots, we consider the 
possible yields in repeated trials if the manurial treatments were applied in all possible 
ways to different plots. Our experiment is systematic in the sense that we prescribe a 
different treatment for each plot; it is random to the extent that we allot the treatments 


to plots by chance. 


25.24. "There is one source of possible confusion here which it is desirable to remove. 
In our agricultural example complications arise because of the physical contiguity of the 
plots, and we shall see below that it is often desirable to eliminate by special designs system- 
atic fertility gradients in the soil. In other classes of experiment where we desire orthogon- 
ality, the members need not be subject to this kind of effect, and often are not. Reverting 
to the example of raw versus pasteurised milk which has already been mentioned, suppose 
we take a simplified case and wish to measure whether the two different milks have different 
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effects on boys and girls. With a class of 40 children, 20 boys and 20 girls, we can proceed 
in several ways. It is obviously useless to give raw milk to all the boys and pasteurised 
milk to all the girls, for then we have no measure of the differential effect, if any, for either 
sex alone, We might toss up in each case and allot raw or pasteurised milk to each child 
by chance; but this would probably make the data non-orthogonal. To attain orthogon- 
ality, we should allot 10 children to each of the four sub-groups BP, GP, BR, GR (where 
B = boy, G = girl, P = pasteurised, R = raw). We then have an analysis of variance— 


Degrees of freedom 


* Between sexes . 5 H ci v 5 d : 1 
án Between milks . 3 T 2 : 5 T A paid 

} Residual (including interactions) . A ^ f : ua 
Torar. s . A 4 : $ 39 


This is analogous to a test of a cereal with two fertilisers and 10 replications. 

The question is, how should we allot the children to the four groups? Their sex, of 
course, is determined, but the nature of the milk they receive is at choice. It is here that 
the randomisation will help. The ten children of a specified sex who receive raw milk 
should be chosen at random from the 20 available. In this instance it might be thought 
that any method would do ; but it is best to avoid the risk of bias. If the children were 
chosen by the teacher he might tend to select the 10 bigger boys or the 10 brighter boys: 
If they were chosen alphabetically, we might get brothers and sisters automatically receiv- 
‘ing the same treatment; and so on. The randomisation process avoids all systematic 
effects of this kind and brings us a stage nearer to obtaining an unbiassed answer to our 
questions. 


Sensitivity of a Test 


25.25. In some cases, where the variate is discontinuous, the nature of the test of 
significance which we propose to apply may make a difference to the form of the experiment. 
_ If we are testing a certain hypothesis which can produce a specified number m of experi- 
mental results which are acceptable as conforming to the hypothesis, whereas other 
hypotheses produce a number n of other results, we clearly want to keep m as small as 
possible compared with n. The ideal case, of course, is that of the “ crucial " experiment 
in which the hypothesis can only give one result and other hypotheses give a different 
result. The result then proves or disproves the truth of the hypothesis, and no test of 
significance arises. In statistical practice we do not as a general rule perform crucial 
experiments, but we can sometimes design an experiment so that it is more crucial, if the 
expression be allowed, than alternative methods. 


25.26. Consider, for instance, the case of a cashier who claims to be able to detect 
good money from false at a glance. To test this ability we spread ten coins before him, 
tell him that p are good, and ask him to point them out. What number of good coins p 
should we include among the ten ? 

If the cashier had no power of discrimination and there are P good coins, the proba- 
bility that he would guess right by chance is 


Me) 


^. 


— a 


— e 
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for the total number of ways of selecting p from 10 is the denominator of this fraction and 
only one of them is right. Now we want to choose p so as to minimise the probability of 


3 s 1 in 
such an event, i.e. so as to maximise 9 - This is clearly done when p = 5, so that we 


ought to have five good and five bad coins in the set. Any other number would increase 
the probability that he might be right by chance and hence decrease the sensitivity of the 
experiment. 


Latin Squares 


25.27. We now proceed to consider a different type of design, which has been freely 
applied in agriculture but may also be applied to other forms of inquiry. Suppose we 
have a variety of barley to test and five different treatments to apply. We will assume 
that replication has been considered necessary and will replicate five times, the same number 
as the treatments. We will then divide our block into 25 plots like a chessboard (though 
the plots may be rectangular and need not be exact squares, provided they are all the same 
size). Each row may be considered a replication of the five treatments, and this itself 
involves the appearance of each treatment once and only once in each row. Can we extend 
the arrangement and ensure that in addition the treatments will occur just once in each 
column ? 

The answer is affirmative, as the fellowing example shows :— 


A BC 


. (25.15) 


S 
Qt hh i tc 
RQbBDS 


Cc 

E 
D A E 

D 
An arrangement of this kind is called a “ Latin square”. It was studied extensively by 
Euler in the eighteenth century, though not of course from the statistical viewpoint. 


25.28. The advantage of this arrangement lies in the fact that it eliminates possible 
correlational effects due to fertility gradients in the soil or accidental circumstances which 
may exercise a “ patchy ” influence on the whole block. If we could be sure that there 
were no such influences at work, and that the soil was entirely homogeneous in the block, 
it would not matter where the treatments were placed ; but by imposing the restriction 
that no treatment appears more than once in the same row or column we remove at least 
horizontal and vertical gradients from our comparisons. Suppose in fact that there were 
gradients running across the block and down it. When we work out the mean yield of the 
treatment A we shall add together five values, one of each in the various rows and columns. 
Similarly for B, so that a comparison of A and B is not affected by the systematic influences, 
which work equally on both. : n 

It is not, of course, true that the Latin square arrangement eliminates every effect due 
to soil heterogeneity. There might be systematic effects running diagonally which might 
still remain. It is, however, clear that in removing the effects in two perpendicular direc- 
tions we have substantially improved the comparison of mean yields as compared with 


a systematic arrangement. 


A.S.—VOL, II. 
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25.29. The analysis of variance of a p x p Latin square may be carried out in the 


following form :— 


Sum of squares d.f. 
Between rows A ; ` : 2—1 
Between columns . 4 F 3 p—1 
Between treatments  . 5 C p—1 
Residual . - : - - (p—1)(p— 2) 
" TOTAL : , , : p-—1l. 5 . (25.16) 


“and the four constituent sums are, on the hypothesis of homogeneity, distributed as vy? 


independently. Before proving this result we will consider an example. 


Example 25.1 (from Thomson, Brit. J. Educ. Psych., 1941, 11, 135; data by S. D. Nisbet). 


A set of children were divided into four equal groups and each group was given four 
lists of words to test spelling ability. Each list formed one of four different types of test 


which we denote by A, B, C, D. The arrangement of the experiment is shown in the 
following table, together with the total scores of the corresponding groups :— 


Groups of children 


e —M — 
1 2 3 4 | TOTALS 
A B o D | | 
1 81 41 44 g3 |= 3910 
D A B Kaw 
2 38 97 42 49 226 
Lists of e D 4 B | 
rds 3 31 43 67 36 177 
B c RE ERES 
4 57 33 43 81 214 
Torars 207 214 196 ZN 42/836 “| 
| | 


For instance, the first group of children had the first list of test 4, the second of test 
D,andsoon. No group had the same lists as another group, and each list was used exactly 
once. The scores (corresponding to yields in the agricultural case) were in fact the number 
of words spelled wrongly in a prior test but correctly in this test. 

The above table, of course, does not represent anything corresponding to the physical 
layout of an agricultural experiment, but it shows how a similar object can be secured to 
the avoidance of contiguous effects. ` Since it is possible that some relationship may exist 
between the lists of words and the tests (e.g. by accident one list might be particularly 
unsuitable for a test), we wish to ensure that not only will each group of children have 
each of the four tests, but that no list shall be given more than once and every list at least 
once. This is precisely what the Latin square accomplishes. The fact that the diagonal 
arrangement of the letters is systematic does not affect the present inquiry, though in an 


— 


— 
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agricultural experiment a systematic diagonal fertility gradient might affect comparisons 
between treatments. 
An analysis of variance on the usual lines gives the following results :— 


Sum of Squares. d.f. Quotient. 
| 
Mists; (rows). c «ma oa ES I bce 359-5 3 119-83 
Groups (columns) . . . . 74-5 3 24-83 
Tests (treatments) . . . . 4626-5 3 1542-17 
Residual all Mel Le edt 606-5 6 101-08 n 
AAEE A ER ERO: A 15 l 


The differences between lists are evidently not-significant, from which we should conclude 
that they appear to be on a par so far as these tests are concerned. The quotient due to 
groups indicates that the children are more alike than chance would lead us to expect, but 
not significantly so, for the variance ratio 101-08/24-83 = 4-1, v, = 6, », = 3, is not signifi- 
cant. On the other hand, the quotient due to tests is very significant, the ratio - 
1542-17/101-08 = 15:3, », = 3, va = 6 being beyond the l-per-cent. point. We conclude 


that there do exist differences between the tests. 
> 


Construction of Latin Squares z 

25.30. The numbers of possible Latin squares of order p is very large for high values .. 
of p. There are, for example, 576 squares of order 4 ; 161,280 squares of order 5 ; 373,248,000 
of order 6 and 61,428,210,278,400 of order 7. Up to this order they have been enumerated. 
Although many examples of squares of higher orders are known, the problem of enumeration 
for p 2 8 awaits solution. Details and examples will be found in Fisher and Yates’ 
Statistical Tables. 

By interchanging rows and columns the square can always be brought to a form in 
which the top row and left-hand column are in the order ABC, etc. It is then said to be 
a “standard square ". For instance, there are four standard squares of the fourth order :— 


ABCD 

BTA DIC BG BDAC BADOC (25.17) 
CDBA CDAB CADB O D-A-B ; 
DCAB DA: BNO. DCBA DCB A 


From each of these, 144 (= 4! 3!) squares may be derived by permuting all columns and 
all rows except the first. (There is no point in permuting the first row, because the result 
would be a repetition of squares already obtained with an interchange of the letters 
A . . D, not an essentially different layout.) The total number of squares, as stated 
above, is therefore 4 x 144 = 576. 

It is only necessary to specify the standard squares. To select a Latin square at 
random we choose a standard form at random and then permute rows and columns at 
random, the randomising process being most conveniently carried out by Sampling 
Numbers. For squares of order 8 or more, where the standard types have not been enumer- 
ated, we can only choose one of those which has, and hence select one at random from a 


restricted set of all possible squares. 
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Analysis of Variance for Latin Squares 

25.31. We must now justify our assertion that the Latin square may be analysed 
in the form (25.16), and that the z-test applies to the variance ratios which arise in the 
analysis. 

For an ordinary two-way classification we have 


E (t — @,,)? = E (2j, — @,)? + Z (Ee — 2%)? H E (Rp — Hj, — By + c) 
Thus, if x, is the mean of rows and x, that of columns in the Latin square, we have, writing 
€ for t.. 

X (t, — 2)? = X (£, — 2)? +2 (£, — 2)? H 2 (Ere — 2p — a +)?  . (2518) 
and the three parts on the right are distributed independently as vy? with p — 1, p — 1 and 
(p — 1) (p — 1) degrees of freedom respectively. 

Now 
2 (Uro — tp, — To + 7)? = X (a, — T)? + E (Ere — Cp — Lo — T, + 2H)? 

+ 22 (x, — T) (tp, — Tp — o — Ti + 2%) . . (25.19) 
where z, is the mean of treatments. 

Consider the cross-product term in (25.19). The summation takes place over all p? 
values in the Latin square. Let us confine our attention to the summation for some par- 
ticular treatment. For this summation the factor z, — $ is constant. Summation for 
the other factor gives 

Z (pe — Up — Ly — Ty + 2%) = px, — E x, — X 2, — pz, + We . (25.20) 
and since one treatment occurs in each row and column, 
2k, = pi 25.2 
Sa = . . . . e . (25.21) 
and hence the sum (25.20) vanishes. 

Thus the cross-product in (25.19) vanishes also and we have 
2 (%_ — 2)? = E (x, — 2)? + X (x, — 7)? + F (x, — 2)? 

FE (pg — Up — ty — t + 25)? .  _« (28.22) 
This gives us the analysis of the sums of squares, and it only remains to show that the third 
term on the right in (25.22) is independent of the fourth. It will then follow that the four 
terms are distributed independently with p — 1, p — 1, p — 1 and (p — 1) (p — 2) degrees 
of freedom. 

The required property of independence can be established directly, but it also follows 
from considerations of symmetry in the Latin square which have an interest of their own. 
We have regarded the square as composed of rows and columns, with treatments allotted 
in a certain way ; but by rearrangement we can equally well regard it as composed of rows 
and treatments with columns allocated in a certain way. For instance, if we take the 
first standard square in (25.17) we may write it :— 


"Treatment : 


Age Hie Om) 

Rows: 1 (, CUNG 
2 06, €, €, €, 

3 06, €, €, €, 

4 6,0, €, €, 


where, for instance, treatment A occurs in row 1, column 1 (C,), row 2, column 2 (C;), and 


ra 
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soon. This, of course, is not a physical layout, but that is immaterial for present purposes, 
It follows that since the sum of squares between columns is independent of the residual in 
(25.22), so also is that between treatments. 

The variance analysis then takes the form 


Sum of Squares. d.f. 
Bows : = (wp — 2) p-1 
olumns . Z (£e — #)* p—1 
Treatments Da — gn DET . (25.23) 
Residual . E (Ero — Lp — Xo — a4 + 22)? (p — 1) (p — 2) 
TOTALS 2 (tre — 2)? p?—1 


25.32. The above form provides a homogeneity test of the usual kind. If the test 

proves significant of heterogeneity we may, in the usual way, consider the hypothesis that 

Zro = A, + bo F O + le « 5 7 . (25.24) 

where ¢,, is normally distributed about zero mean. We leave it to the reader to show, as 

in Chapter 23, that in such an event the residual mean square is an unbiassed estimate of 
the variance of ¢ with (p — 1) (p — 2) degrees of freedom. 


25.33. As in the case of randomised blocks, it appears that under certain general 
conditions the z-distribution is reproduced approximately for fixed values which are per- 
muted in all the permissible ways consistent with the Latin square design. We omit an 
investigation into this result (for which see Welch, 1937) as the algebra is considerably 
more complicated than for randomised blocks. ‘The result has been confirmed by a limited 
number of experiments. 


Graeco-Latin and Orthogonal Squares. 
25.34. If the two squares 


ABCD ABOD 
BA DUO O TDTATB 3 
CRD RARD: D- OFB A . . . (25.25) 
DIG TEA BADCc 
are superposed we have the arrangement— 
AA BB CC DD 
BC AD DA CB 
CD DC AB BA . (25.26) 
DB CA BD AC 


in which every possible pair of letters (XY being regarded as different from YX ) appears 
just once. Such a pair of squares is said to be orthogonal. The form (25.26) is sometimes 
written with Greek letters instead of the second Roman set; hence the name of Graeco- 
Latin square. It is also possible to superpose a third factor which we will denote by the 
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numerals 1-4 in such a way that each combination of any pair of types occurs just 
once, e.g. 
Aal BB2 Cy3 Dó4 
By4 A603 Dad Cf1 
Có2 Dyl AB4 Bad d P ^ 
DB3 Ca4 Ból Ay2 


Complete sets of orthogonal squares (i.e. those in which there are p — 1 factors for a p x p 
square) are known for all prime p and for p — 4, 8 and 9.  Curiously, there is no set for 
p —6. Up to and including p = 7 they have been enumerated. 

We shall not enter here into the use of these squares in experimental design. They 
are generalisations of the Latin square in which, by suitable arrangements, several factors 
can be tried out simultaneously, so that all possible combinations of pairs occur an equal 
number of times. 


. (25.27) 


Confounding 


25.35. It will be evident that if we wish to consider in full a classification according 
to several variates, partieularly with replications, the number of individual members in 
the sample may be very large. For instance, if we wish to test a variety of barley with 
three different applications of four types of fertiliser, there must be 81 yields even without 
replication, if we want to make all the comparisons possible. Physical considerations may 
make a layout of an experiment on such a scale impossible. "The difficulty is possibly more 
serious in experiments on expensive animals such as cows. 

Where economy in the size of sample is a very material factor we may be able to reduce 
the sample at the expense of sacrificing some of the less important comparisons. For 
example, to consider once again the case of barley and the effect of fertilisers : we shall 
undoubtedly wish to compare yields of D and not-D, K and not-K, P and not-P, N and 
not-N. We may also wish to compare first-order interactions of the type DK and not-D, K. 
But it is quite possible that interactions of higher order, such as the effect of dung in the 
presence of two other fertilisers, are negligible. Where we are prepared to assume that this 
is so, on the basis of prior evidence or otherwise, we can dispense with certain information 
and still make the comparisons we wish while retaining properties of orthogonality. 


25.36. Consider, as an illustration, an experiment with three fertilisers, each of which 
is applied or not applied, say N, P and K, and four replications. In the ordinary way 
there would be 32 plots and we should have an analysis of variance as follows, assuming 
that block-treatment interactions may be regarded as part of the residual :— 

Sum of squares. 
Blocks 
NES. 

I 

K 

NP 

NK 

SKINS. 
NPK : 
Residual . 


"Tovar ; $ : 


u | to 
ZJE ermee o È 


x 


qo 
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Now suppose that we divide our main blocks into two sub-blogks, the first containing 
the treatments 
O (None), NP, NK, PK,. : i a . (25.28) 
and the second the treatments 
NTP. KOEN PK. D ^ ; ; . (85.29) 
We may then analyse the variance as follows, regarding the sub-blocks as blocks of four 
plots each :— 


= 
£e 


Sum of squares 
Blocks à 5 5 : 5 . 
INS EE s : b D 
NR ó A S " 5 ; 
NK 
ELEC IES 
Residual 


Oe ete PEE 


= 


[^ 
m 


TOTAL : r n : ; . 


In fact, if we wish to compare the yields with N and those without N, ie. 


N + NPK 4 NP +NK 
with O PKA Pek 


it will be seen that we add two members from (25.28) and two from (25.29), so the difference 
is not affected by block differences; and similarly for the other comparisons. Such a 
design is said to be balanced, and the interaction NKP is confounded with block-differences, 
since in the eight blocks it cannot now be isolated from block effects. The advantage of 
the second design over the first is that, without losing anything appreciable in comparisons 
between treatments, we have gained a good deal in the assessment of block effects ; for the 
residual has only declined from 21 to 18 d.f. whereas the sum of squares between blocks 
has increased from 3 to 7 d.f. 


25.37. The ideas of orthogonality, randomisation, balance and confounding have 
been developed to an advanced degree and with great ingenuity, particularly by Fisher 
and Yates. The slight sketch we have given of the methods in this chapter is intended to 
be no more than illustrative of the way in which the theory of experimental design is capable 
of development, at least in certain fields, and the manner in which efficiency may be imported 
into a practical inquiry by a due regard to theoretical requirements of the design. For a 
comprehensive account of this branch of the subject the reader should consult Fisher’s 
Statistical Methods and Design of Experiments, Yates (19375), and a useful introductory 
account by Goulden (1939). At this point we leave these particular topics and return to 


certain general matters. 


Design and Randomisation 

25.38. Whenever an inference is to be made, and particularly where hypothetical 
populations are concerned, the reader will find it useful to ask himself what precisely is the 
population under consideration. We can illustrate the point very usefully by discussing 
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a subject on which there has recently been difference of authoritative opinion—that of 
occasional conflict between the requirements of balancing and randomisation. 


25.39. Consider in the first place the testing of a cereal under two treatments, denoted 
by A and B; and to simplify matters as much as possible, suppose we are to sow eight 
plots in a straight line. In what order shall we allot the treatments ? 

If the plots are not too large so that the row covers a big area, it is quite possible that 
there may be a trend of fertility in the soil itself which will affect yields differentially and 
hence interfere with comparisons which we might make. Suppose that we do wish to 
guard against a fertility gradient so far as possible. We might then decide on one of the 
“balanced ” arrangements : 

AABBBBAA 
ABBAABBA 


ABABBABA 


As will be easily seen, if there is a linear gradient in fertility along the row the means of 
A and B treatments respectively will be affected to the same extent and hence their differ- 
ence unaffected. For instance, consider (25.30) and suppose the linear gradient is repre- 
sented by an additive factor g +kp,k=1...8. On the hypothesis that the remain- 
ing effect consists of a constant a for A-treatments with a normal residual £, and similarly 
for B, the yields are 

A-treatments: g+ p+a+é&,q+2+a+,q+%pt+at+&,q+8p+a+ és 
B-reatments: q + 3p +b + £s» g +4p +b +ë, q +5p+b +ë, q+ 6p +b +ë 
with means 


1 (4q + 18p) +a +} (6 +4 46, + &) 
1 (49 + 18p) +b + Ẹ (Es + E + ës + &) 


respectively. The differences of these two are independent of q and p. 


25.40. The alternative procedure in allotting treatments would be to distribute 
them at random. Such balanced arrangements as (25.30)-(25.32) might then arise by 
chance. But we might also get such an arrangement as 


CARER Bees EM A —'. —. (9588) 


What are we to do in such circumstances? If we reject this arrangement we are rejecting 
the random allocation of treatments in favour of systematisation. If we accept it we 
know quite well that a fertility gradient, if it exists, will invalidate the inquiry. 

The reader will no doubt agree that, if other things are equal, the balanced arrange- 
ment is better than the arrangement (25.33). What we have to examine is whether other 
things are equal; in short, whether in rejecting randomisation we have lost anything 
useful in the testing of significance. 


25.41. Consider a rather more general case in which an experimental area is laid 
out in p blocks of q treatments each. If the subscript j refers to blocks and % to treat- 
ments, we have the usual analysis with sum of squares between blocks (p — 1 d.f.), between 
treatments (g — 1 d.£), and residual ( (p — 1) (q — 1) d.f.). : 

Now we have seen that if the individual plot-yield can be regarded as a block effect 


plus a treatment effect plus a normal residual with constant variance from plot to plot, 


* 


——————— 
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the significance of treatment effects can be judged from the z-test in the usual way by 
comparing sum of squares between treatments with the residual sum of squares. This 
is true whether treatments are allocated at random or not. 

But suppose we wish to adopt the alternative viewpoint of 23.41 and make the infer- 
ence in the set of values obtained by permuting the observed values. These permutations 
will not affect the block means or the total mean, and hence the sum of squares between 
blocks remains constant. The remaining part of the analysis may be written— 


Sum of Squares. d.f. 
Treatment . . . | S1 =Z (z.k — 2..)' g= 
Residual. . . | | S, = I(r- z. — erte.) (p—1)(q—1) | + (25.34) 
Torars .. . | Ss = E (tjk — 2j)? plq —1) 


Rather remarkably, the z-test holds for the ratio 
8, (p—1)(qg— 1) 


qud Sa . 
provided that treatments are allocated at random, independently of the distribution of 
residual effects in individual plots. 


25.42. Consider, then, the population of values, (q !) ^7! in number, obtained by per- 
muting the observed values. The total sum of squares S, in (25.34) is the same for all 
members. Consequently if S; is too great, S, must be too small and vice-versa; and in 
general, if we confine ourselves to certain layouts and reject others, all the possible values 
of S, cannot appear. It is this fact which has been seized on by advocates of randomisa- 
tion. They point out that for balanced layouts S, tends to be smaller than for random 
layouts (a conclusion supported by experiment) ; consequently that the test of significance 
is invalidated and the estimate of error S, too big. The difference between the two modes 
of thought may be expressed briefly in this way : with balanced layouts the real error is 
reduced but the estimate of error is too large, so that the significance of the result is more 
in doubt; whereas with random layouts the estimate of error is exact but the error itself 
may be larger. The question is whether one prefers to be nearer the truth without knowing 
how near, or farther from the truth with a knowledge of the limits of error. 


25.43. For details of the controversy on this topic the reader may consult the papers 
referred to at the end of the chapter. It brings into prominence an important question 
of inference which can only be decided by the experimenter himself. If he chooses to 
regard any act of experimentation as one of a large population of such acts, to be carried 
out by himself or other workers, he may prefer randomisation in all circumstances, not- 
withstanding that every now and again he will hit by chance on a design which he knows 
is likely to give misleading results. But if he cannot take this very detached attitude (and 
most experimenters, being human, would think it poor compensation that their own errors 
are balanced by the better luck of other people) then he will prefer to design a balanced 
layout, even if the exactitude of his tests of significance is impaired. 
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25.44. We must, however, not leave the reader with the impression that the 
desiderata of both schools of thought are totally incompatible. It frequently happens that 
one can select a design which is both balanced and random. The Latin square is a good 
example. By imposing the restriction that a treatment must not appear more than once 
in a row or column we remove to some extent the interference of fertility gradients; by 
requiring that it shall appear just once we balance the design; and by leaving the rest 
of the layout to be determined by a random selection from all possible Latin squares of 
that order we randomise so as to reproduce the distribution of the variance ratio in the 
required form, thus, as “ Student " remarked, “ conforming to all the principles of allowed 
witchcraft ", 
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EXERCISES 


25.1. A population is given by specifying the frequencies in comparatively narrow 
ranges of one variate, the frequency in the ith range being N; and ranges being of equal 
width. Show that if the population frequencies are large, the best estimator of the mean 
of a second variate which is Iinearly related to the first (in the sense of the unbiassed estimator 
of minimum variance) in a sample obtained by taking n; members from the ith range is 
given when n; is proportional to N, 


25.2. Extend the result of the previous exercise to the case where ranges are of 
unequal width. 

If the number of farms in England and Wales is known in the acreage ranges 0—49, 
50-99, 100-199, 200-499, 500 and over, what sampling proportions would you take in the 
various ranges to estimate the total acreage under wheat ? 


fL 
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25.3. Ifa variate € can be regarded as the sum of a systematic component £ (z) and 
an uncorrelated random component c, and y similarly as 7 (x) + £» and if the random 
components are uncorrelated with each other, show that 


] cov { £ (x), n (a)} 
r (È n) { (var £ (x) + var e,) (var 7 (x) + var ej) ] ^ 


Hence, if a population is divided into strata the correlation between £ and y for these strata 

will, in general, be less than that obtained by combining strata to obtain larger units ; 

and as the strata are further subdivided the correlation between £ and 7 tends to zero. 
(Spearman, 1907, Am. J. Psych., 18; Wold, 1938a.) 


25.4. Illustrate the effect of the foregoing exercise by calculating the correlation 
coefficients for the data of Table 14.4 (vol. I, p. 333), (a) by adding the variates in pairs 
and so obtaining 24 values; (b) by repeating the operation and obtaining 12 values; 
and (c) by repeating the operation and so obtaining 6 values. 


25.5. (Markoff's theorem.) Consider a sample of » independent values c, . . . v, 
x, being drawn from a population /7; with mean u; and variance o;. Suppose we have 


a function 0 defined by 
8 
Pia t 
j=l 


where the b’s are known and the parameters p; depend on the j/s according to the equation 
LÀ 


u= D diy Pp s <n 


=. 


the a’s also being known. Then an unbiassed estimator of 0, say t, with minimum variance 
may be written— 
n 
[E D Àj tje 
j=1 


Show that the function ¢ is given by substituting for the p's in the expression for 0 the 
functions q given by minimising 
n 1 8 2 
Zafe- 2a} 
i=l * j-1 
with regard to the q’s considered as independent variables. : ; 
Show further that if this minimum value is S, the estimated variance of t is 
8, 


n—s 


E (o). 


25.6. In a feeding experiment there are given five different foods, each of which is 
available in four grades. It is desired to feed each animal with one grade of each food, 
but only one, so that a comparison may be made of the effect of the different grades of any 
particular food. Use the Graeco-Latin square to show how the feeding can be carried 


out. 


208 ! DESIGN OF SAMPLING INQUIRIES 
25.7. A water diviner is to be taken to ten spots and asked to say whether water 
is present below the surface. It is decided to choose five spots where water is known for 
certain to exist and five where it is known not to exist. The order in which the spots are 
to be presented is determined by spinning a coin, heads denoting water and tails not-water. 
The spinning of the coin results in the first five trials giving heads. Would you 


accept this result or spin again ? 


25.8. Show that a Latin square may be regarded as a three-way classification in 
which p? members are not zero, but p? — p? members vanish. Derive the analysis of 
variance for the Latin square from this approach and generalise it to the Graeco-Latin 
square, 


^m 


CHAPTER 26 
GENERAL THEORY OF SIGNIFICANCE-TESTS—(1) 


Hypotheses to be Considered 

26.1. The kind of hypothesis which we test in statistics is more restricted than the 
general scientific hypothesis. It is a scientific hypothesis that every particle of matter 
in the universe attracts every other particle, or that Homer was blind; but these are not 
hypotheses such as arise for testing from the statistical viewpoint. A review of the various 
tests which have been introduced earlier in this book indicates that the great majority 
specify something about a population. Some merely assert a general fact such as “ the 
population is continuous " or “ the population is rectangular ". Others are more definite, 
as for instance “ the population is normal and has a mean qu," ; and again others are less 
definite in one direction and more definite in another, e.g. “ the population has unit vari- 
ance". It is also usually a part of the hypothesis that the sample from which the inference 
is being made was obtained by a random process. 


26.2. Suppose we have a set of random variables x, . . . z,. In the sample space 
W of n dimensions the sample-point whose co-ordinates are x, . . . v, determines a point 
E, say, with a distribution function which.we may write as P (E). If w is any region in 
W, we may derive the probability that Æ falls in w, say P (E ew). Then we shall say that 
any hypothesis concerning the law P (Hew) is a statistical hypothesis. If it determines 
the law completely we shall call it simple. In the contrary case it is said to be composite. 

For instance, in testing the significance of the mean of a sample of n, it is a statistical 
hypothesis that the parent is normal. This is composite, as also is the hypothesis that 
the parent is normal with mean y or the hypothesis that the parent is normal with variance 
o2. The hypothesis that the parent is normal with mean p and variance o* is simple because 


then the parent is fully determined. 


Example 26.1 

In sampling from a population dichotomised into classes possessing the attributes 
A or not-A, say in proportion w and y (= 1 — w), the sampling distribution is the binomial 
(y +o)". This is completely determined by the value of w, and hence a hypothesis as 
to the value of w is simple. Such, for instance, would be the hypothesis that male and 
female births oceur in equal proportions. Similarly, in a multiple classification with pro- 
portions w, 9: . . + Ds 9 simple hypothesis would specify values for all the ws; if only 
one were specified and s were greater than two the hypothesis would be composite. 

In sampling from a bivariate normal population characterised by two means, two 
variances and a correlation, a hypothesis about any one parameter would be composite, 
and similarly for a hypothesis concerning two, three or four parameters. Only if all five 
were specified in addition to the normality of the parent would the hypothesis be simple ; 
and this notwithstanding the fact that the sampling distribution of the means is inde- 
pendent of the other three parameters, and that of the correlation coefficient independent 


f the other four. 
of the o Pod 


B 


^" 
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26.3. A hypothesis which determines the law P(Esw) completely except for v 
parameters is sometimes said to have v degrees of freedom. Such a hypothesis may be 
regarded as an aggregate of simple hypotheses. For instance, the hypothesis that a popula- 


tion is normal with mean y is the aggregate, for all o?, of hypotheses that it is normal with 
mean y and variance o*, 


26.4. The kind of argument we have used in testing hypotheses, for both large and 
small samples, is of this character: assuming that the hypothesis is true, we can, with 
any assigned probability «, find a region w, in the sample space W such that the probability 
of E falling in W-2w, is « We call W-w, the region of acceptance and the complementary 
domain w, the critical region. (This is the nomenclature of Chapter 19.) If our observed 
E falls in w, we reject the hypothesis; if not we accept it. As a rule, in practical cases, 
our regions wy are determined by the values of some statistic such as # in testing the mean. 
- 


Errors of First and Second Kind 


26.5. In general, as we saw in Chapter 19, there are many possible regions of accept- 
ance for any given hypothesis and any given probability level «. For all of them we shall 
err in proportion 1 — « of the cases in the long run by rejecting the hypothesis if Æ falls 


. in the critical region—provided that the hypothesis is true. But what about the case when 


it is not true? We cannot ignore this case, for its possible existence is the very reason for 
carrying out the test. It is of no use whatever to know merely what the test will do when 
the hypothesis is true without regard to its behaviour in the contrary case; for if we are 
to consider only the events which happen when the hypothesis is true we have no right to 
“use a test based on that assumption to reject it. 
By having regard to the behaviour of the test when the hypothesis is not true we are 
able to lay down criteria for choosing among the various tests obeying the rule 
P{Hew.|H}=1l—a, . f : 3 . (26.1) 
where H, is the hypothesis. In fact we shall seek for the test which, while obeying (26.1), 
minimises the risk of accepting H, when an alternative hypothesis H, is true and H, accord- 
ingly is false. That is to say, we shall endeavour to find w, such that, in addition to (26.1), 
we also have 
1—P{Hew,|H,}=minimum. . 5 ; . (26.2). 


26.6. From a slightly different, viewpoint we may say that there are two possible 
errors in judging a statistical hypothesis : 

(a) We may reject it when we ought to accept it, that is, when it is true. 

(b) We may accept it when we ought to reject it, that is, when it is false. 

"These are known as errors of the first and second kind respectively. The error of the 
first kind we can control exactly by setting up the proper region of acceptance determined 
by a. Errors of the second kind cannot be controlled in this way, but we can sometimes 
calculate their probabilities, and in any case can try to reduce them to a minimum. This 
is the fundamental idea, first given explicit expression by Neyman and E, S. Pearson 
which determines most of the work in the present and succeeding chapters. : 


26.7 - The possibility of finding regions of acceptance obeying (26.2) clearly depends 
on a precise specification of what alternative hypotheses are under consideration. We 
had better emphasise the importance of this point. It is customary to speak, and even, 


Md 4 


+ 
+y * 
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in a loose kind of way, to think of testing a hypothesis without reference to altérnatives, 
To take the case of testing for normality, we often say that the hypothesis under test is 
that the population is normal without specifying what other form it might have. The 
reader may say that the alternative he has in mind is merely the negation of the hypothesis, 
namely that the population is not normal. But if so he will find it very difficult—in ny 
own view impossible—to justify any of his tests on a logical basis. He will calculate certain 
statistics and accept the hypothesis if their values are consonant with the normal values ; 
but it will always be possible to find other populations for which the observed values are 
even closer to expectation. If agreement between theoretical and observed values is the 
criterion he should reject normality in favour of these alternative hypotheses. It is not 
until he specifies his alternatives and considers errors of the second kind that some firm 
foundation for intuitive processes begins to appear. 

26.8. Perhaps it may help to clarify the fundamental concepts of the present approach 


Fig. 26.1 (see text). 


if we consider a simple illustration where the hypothesis under test 7, is simple and there 
is only one alternative H, which is also simple. In Fig. 26.1 we show diagrammatically the 
scatter of sample-points which would arise in samples of two, v, and Xa, the cluster on the 
right being that due to H, and the one on the left to H,. In practice, of course, the sampling 
distributions are more usually continuous, but the dots will indicate roughly the condensation 
of sample density round central values. é 

Tn determining the critical region we have to find an area in the (2, x) plane such that 
its “content ” is 1 — a. Two possible areas are shown, wg being the area to the left, of 
the line PQ, and wy the area between the lines AB and BC. In either case the proportion 
in the critical regions of the frequency on hypothesis H ois 1 — a, and if we reject H, when- 
ever the sample-point falls in wy (and similarly for wj) we shall commit an error of the first 
kind in proportion 1 — « of the cases in the long run. ; 1 

Consider errors of the second kind. By using the region w, we should reject HJ, —and 
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therefore accept H,—every time the sample-point arose from H,, that is to say in practically 
all the cases where H, was true, since nearly all the sample-points arising from H, lie in 
wy Errors of the second kind are therefore very rare. On the other hand, if we were to 
use w, we should accept H, every time a sample-point arose from H, but did not fall between 
the lines AB and BC, that is to say fairly frequently. Clearly w, is the better critical 
region and has a much smaller error of the second kind than wọ. 


26.9. It is to be noted that the argument does not depend on the relative frequencies 
of occurrence of the hypotheses H, and H,. This is generally true. There is no concealed 
form of Bayes' postulate in this approach. 


26.10. When there are n variates and p unknown parameters the geometrical repre- 
sentation can be extended by imagining a sample-space W of n dimensions adjoined to 
a parameter space of p dimensions. We cannot draw a picture of such a case on a two- 
dimensional sheet of paper, but the geometrical imagery and terminology of the method 
are frequently useful. A graphical illustration of a two-dimensional sample-space and 
a one-dimensional parameter space has already been given in Fig. 19.3. 


The Power Function 
26.11. If for a simple hypothesis H,, (26.1) is true we define 
P{Eew,|H,} — (H, | wo) : : 3 . (26.3) 
as the power of the critical region w, with respect to H,. Clearly the power is greatest 
when the probability of an error of the second kind is least. 

In the expression on the left of (26.3) we regard the probability that Æ falls in w, as 
dependent on H,, the hypothesis alternative to Hy. In the expression on the right we have 
regard to the power of the test for H, as dependent on wy. 

If there exists a particular region w, with greater power than any other region obeying 
(26.1) we shall say that it is the best critical region, and the test based on it will be called 
the most powerful test. 


26.12. We proceed to consider in turn the following cases :— 

(a) Ho simple; one alternative H, which is simple. 

(b) H,simple ; an alternative H, which is composite but can be regarded as an aggregate 
of simple alternatives. 

(c) H,'and H, composite but expressible as aggregates of simple hypotheses. 


Simple Hypotheses : One Simple Alternative 


à 26.13. Suppose the parent population is continuous, so that the simultaneous dis- 
tribution of the n sample values x, . . . x, is continuous ; and let the frequency functions 


of the sample values on hypotheses H, and H, be Po (T1 . . . 8a) and p; (m, . . . Xn) respect- 
ively. Write dx for the element dv, . . . dz, . Then we have 


Podr =1—« Cu = ph 2-57 (2644) 


We 


and wish to maximise, for variations in the domain Wo, the integral 


Bess : : A els . (26.5) 


€x. 
or 
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This is a problem in the Calculus of Variations and is equivalent to maximising uncon- 


ditionally the integral 
1 
Í Q — jp») de x : 5 : . (26.0) 
Wo 


or, what is the same thing, to minimising 
f A UCET RP REC 
Wo 


where k is a constant to be determined by (26.4). 
It is known that the condition for a stationary value of (26.7) is that, on the 
boundary of w,, 


po — kp, = 0. 5 A 4 . (26.8) 
If the solution is a minimum we have, inside w,, ` 
Po < kpı 3 3 1 6 s . (26.9) 
and outside w 
Doc kpr .. z 5 : 5 . (26.10) 


This solution to the problem is fairly obvious on general grounds. If U is a function which 
is sometimes positive and sometimes negative, with a line of demarcation where it is zero 


(as must exist in virtue of continuity), we clearly minimise Í U dz by taking into the region 


w, all the points for which U is negative and no more. This gives us (26.9) and (26.10), 
and the boundary of w, is the locus for which U vanishes. By convention we regard the 
boundary as included in w,, which accounts for the equality in (26.9) and its absence in 
(26.10). 


26.14. The conditions expressed by (26.8), (26.9) and (26.10) are sufficient as well 
as necessary. For let w, be any other region for which 


pode —1-—a. 


If w, and w, have a common part denote it by wo. Then 


Í pode =1-a-Í Po dx 
My — Wor Wor 


ma Í po da: 
Ms =W 
and hence, from (26.9) 
k pide > | pode = { Po dx 
Wo Var Wa— Wor wu Wer 
>k pı dx. 
Wi— Wor 


Adding to both sides zÍ pı dx, we have 
Wor 


k| pdr> f pide, . 5 : T . (26.11) 
We w 


and hence, for positive k, the power of w, is less than that of w, and the latter is the best 


critical region. 
A.S.—VOL. II. 


ey 
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Both in this section and implicitly in the last we have required & to be positive. That 
it must be so if w, is to exist emerges from (26.8), for p, and p, are essentially not negative, 
and if & were negative no solution for real variate-values would exist. 


Example 26.2 
Consider the normal population 


dF = Ji 4 (x — u)*)dz, =o <% «oo. 


Let the hypothesis H, be that y = ap, and the alternative that u =—a,. We have— 


We can conveniently express this in terms of the sample mean Z and the sample variance 
$?, obtaining for the density function 


ex[—5 (6 — a) + 2 


A similar expression is found for p, and thus, for the boundaries of the best critical region, 
we have 


Ep = exp [=3 (E — a 6-2] 


= exp E 3 (a, — a, (22 — a, — a) |. 


This yields for the critical region 


(ao — a) (92 — a4 — ai) <- "log k, 
or 
(4 — a) ® <} (aj — aj) +- Ž log k = (a, — 4) Žo, say. 

If a, — a, the region is then defined by 

T «c 
but if a, > a, it is defined by 

m o. 
The reader should compare the two cases on a diagram similar to that of Fig. 26.1, 
Example 26.3 


Consider again the normal population when the mean is kn 
own, 
variance unknown, e.g.— QUEE QE Ner 


1 d g? dz 
evum) - 33) eU ET 


| 
5j 


: = , E 
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We now find, for hypotheses o = o, and o =o, 


t-ta) af genfa) 


which yields, for the best critical region, 
2 
(B® + 8%)(02 — o3) < 2016 Jog fi (2)] 
n 07 
<v (o — o1), say. 
Thus our critical regions are defined by 
My = 5? + 8? <v 0; a5 
ms = 7? +8? >v if 5,2 0, 
The best critical regions in the space W are thus bounded by hyperspheres centred at the 
origin. Whether we take the space inside or the space outside a particular hypersphere 
as the critical region depends on the alternative hypothesis. The probabilities concerned 
can be evaluated directly without evaluating the constants k and v. In fact, the proba- 
mv m (x? + 8?) 
ETT DE e 
tribution with n degrees of freedom, and ies the relation between v and « can be 
ascertained from the y?-integral. 
In this particular case we may find without difficulty the power of an alternative test 


(n. — 1) v^ 


which would suggest itself on intuitive grounds. Suppose we find WD Irons zi from 


bility of exceeding a given value of — E 4$ is obtainable from the y? -dis- 


the xy?-distribution corresponding to n — 1 degrees of freedom and probability level «, 
and use, instead of the hyperspheres centred at the origin, those centred at the sample mean 
8* «v, sw. £ 
Suppose that the alternative H, is that of = 1-1 of. In testing H, for the alternative 
0,2 o, we should, for the test based on v, find y? and accept c, if , 
NMa i 
wb Xo: 
For instance, with n = 5, 1 — « = 0-01 we find X& = 15:086. The probability of an error 
of the second kind is 


1/11 
[ pdz = |° ar a, 
we 0 
ie. is obtained from the y*-integral with argument = = 13-71, giving f (H, | wo) = 0-018. 


On the other hand, had we used 7? instead of yg we should have entered the table with 
four degrees of freedom, giving 13-277. Divided by 1-1 this gives 12-07, resulting in a 
probability of rather less than 0-017. This is the power of the second test and is lower 
than that of the first test, as of course it must be since the latter has maximum power. 


Simple Hypotheses: Families of Simple Alternatives 

26.15. Consider now the case where H, is simple but H, is composite and consists 
of a family of simple alternatives. The most frequently occurring case is the one in which 
we have a class of simple hypotheses 2 of which H, is one and H, comprises the remainder ; 
for example, the hypothesis H, may be that a mean has some value jz) and the hypothesis 
H, that it has some other value unspecified. 
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For each of these other values we may apply the foregoing results and find for each « 
corresponding to any particular member of H,, say H, a best critical region w, But this 
region in general will vary from one H, to another. We obviously cannot determine a 
different region for all the unspecified possibilities and are therefore led to inquire whether 
there exists, among the family of best critical regions w, one which is the best for all of 
them. Such a region is called the Uniformly Most Powerful and the test based on it the 
Uniformly Most Powerful test, conveniently shortened to U.M.P. test. 


26.16. Unfortunately, as we shall find below, the U.M.P. test does not usually 
exist unless we restrict our family Q in certain ways. Consider, for instance, the case 
dealt with in Example 26.2. We found there that for a, < a, the best critical region for 
a simple alternative was defined by 

$«4. 
Now the boundaries of the regions determined by @ = constant do not depend on a, and 
can be found directly from the sampling distribution of z when the probability level 1 — « 
isgiven. Consequently the regions defined by  < Z, are the same for all a, < a, and hence 
, the test is U.M.P. for the class of hypotheses that a, <a). Tt is difficult to see how a better 
test could be devised, for, whatever a, subject to a, < à, the test controls errors of the first 
kind and minimises those of the second. 

However, if a, > a, the best critical regions are defined by z >. Here again, if 
our class Q is confined to the values of a, greater than a, the test is U.M.P. But if a, can 
be either greater or less than a, no U.M.P. test. is possible. The reader will easily verify 
for himself that the same is true for the test considered in Example 26.3. 


26.17. We now show formally that for a simple hypothesis depending on 0,—the 
value taken by the parameter 0 defining a family of alternatives—no U.M.P. test exists 
for both positive and negative values of 0 — 0, if the frequency function p (E |0) is con- 
tinuous, has everywhere a continuous derivative with respect to 6 which does not vanish 
identically, and admits of differentiation under the sign of integration over W. 

Suppose that such a test does exist. Then for any 0 we have, inside w 


Po < kp, 
which we may write 
; p (E |0) > h (0) p, (E | 0,). : , s . (26.12) 
Likewise, for any point Z on the boundary of wy we have 
»(É|0)-—A()p,(E|0). — . .  .  . (2613) 


z By hypothesis p is differentiable in 0 and hence so ish. Moreover, as 6 — 6,, h (0) > 1. 
ence if 


A=0—0, 
and primes denote differentiation with respect to 0, we have 
h(0) — Y1--A [A lotga 0<g <1 


9 p(ÉE|0) 
=] P. A cee 
Ed Doa dm 


4 7 
* a (By P C l Maaa eere er M (30:14) 


Se 


*, 


qu 
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Further we have 
p (E |0) = p (E | 0.) + A [p' (E | 0). rA Ur iru. . (20.15) 
Substituting in (26.12) from (26.14) and (26.15), we find 


A " — Po (E | 60) , 
[e (E | 9) Jo+ra pa E [0,) Eo)” (B10) besos} 207, 22016) 


This is true for any E and É and for all A, whatever its sign, and hence the expression in 
curly brackets vanishes. Thus we have 


IP (10), — Be ta! (10) y = 0. dete teen (aa 


Similarly this equation may be shown to hold outside wọ, and hence it is true throughout W. 
Now we have 


f p Eloa =1, 
w 
and hence, differentiating with respect to 0 and putting 0 = 0, 
[ir 103,42 = o. 
A w 
Substituting from (26.17), we have 
Po C199 (5 (210) Jy de = 0, | 


W Po (E |0.) 
and hence 
[p (E10) h _ 
rita 2 (PE aT) 
Thus, from (26.17) 
[p' (E | 0)],, = 0. € docs domi 


But this implies that the derivative of p with respect to 0 is identically zero at 0, which 
is contrary to hypothesis. The theorem follows. 

It may be noted that in deriving (26.17) from (26.16) we used the property that 4 
may have either sign. If it can have only one sign, that is, if our class of admissible alter- 
natives is confined to the case when either 0 < 0, or 0 > 0,, a U.M.P. test may exist ; and 
so we found in Examples 26.2 and 20.3. 


Best Critical Regions and Likelihood 
26.18. Since on the boundary of a best critical region we have p, — kp, = 0, that 
boundary is determined by the condition that on it the ratio of the likelihoods of two 


functions corresponding to H, and //, is constant. 
Consider now the case where H, comprises a set of alternatives varying according to 
the parameter 0, H, being one of them. In accordance with the principle of maximum 


likelihood we should obtain, as the most likely value of 0, the solution of 


ap 
CIN MLS LEON. UNS 
( z ke ; (26.20) 
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where Ô is then expressed as a function of the variables. If this value is substituted in 
P, we obtain the distribution with greatest likelihood which may be written p (2 max.). 
The surfaces of constant likelihood are defined for this distribution by 


Po — Ap (2 max.) = 0. : $ 4 . (26.21) 
Now these surfaces are, in fact, the envelopes of the family, varying with 0, 
Po — kp, = 0, rr.  . (2625 


for to obtain the envelope we differentiate with respect to 0, giving 2 — 0 and eliminate 0, 
leading back to (26.21). "Thus, if there exists a best critical region (and hence a U.M.P. 
test) for all permissible alternatives H,, such a region will be the envelope with respect to 
such alternatives and will therefore be identical with a region defined by (26.21); and 
hence a test based on the principle of likelihood leads to best critical regions, if they exist. 

Tf, as is more usual, there is no common best critical region, the ratio of the likelihood 
of H, to that of any particular H, is k. The surface (26.21) remains the envelope of the 
family of surfaces (26.22) for which k = 4. 


Example 26.4 


Consider once again the normal form, where both mean u and variance o? are specified 
and the admissible alternatives are that they can have any values, subject of course to the 
variance being positive. For any given 4 and g; the best critical region will be given by— 


gelap] 
(sj -e(n (n) 


This may be written in the form 


or A 


o cx og { = 2 2 
dat (€ — p)? + s?) > constant 
where 
p- Ho 0j — pt 55, 
Of — oF 


Thus, if o, > o, we have 
(E — p)? +8? 2 v?, say; 
and if c, < o, we have 
(E — p)? +s? «vt 

For any specified u, and c, the best critical regions are bounded by hyperspheres with radius 
vyn and centre at Tı =t =... =2, =p. Owing to the fact that p varies with u, and 
01 there will not in general be a best common critical region and a U.M.P. test ; and this 
remains true even if we limit our alternatives to 9: < 9, and y, < uo or by similar 
inequalities. 

We may regard $ and s as independent variables and represent the data on a two- 
way plane (z, s). The best critical regions are then seen to be bounded by circles with 


A 
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centre (p, 0) and radius v. Fig. 26.2 (adapted from Neyman and Pearson, 1933c) illustrates 


* some of the contours for particular cases., A single curve, corresponding to a single proba- 


bility level, is shown in each case. 

Cases (1) and (2): a, =o, and p = + co. The best critical region lies on the right 
of the line (1) if y, > xo and on the left of (2) if u, < uo This is the case discussed in 
Example 26.2. á 

Case (3): e, <o, say o, — jo. Then p = u, + $(u; — uo) and the region lies 
inside the semicircle marked (3). 

Case (4): o, <o and u, = uo The region is inside the semicircle (4). 

Case (5): 0,2» o, and uw, =p. The region is outside the semicircle (5). 

There is evidently no common best critical region for these cases. The regions of 


(u,,0) f" 
Fic, 26.2.—Contours of Constant Likelihood in a Two-dimensional Case. (See text.) 


acceptance, however, may have a common part, centred round the value (Hos 9), and we 
should expect them to do so. Let us find the envelope of the best critical regions, which 
is, of course, the same as that of the regions of acceptance. The likelihood ratio is 


E (25) -(522))] 
Oo =P le ox 0g 2 Oo [A 
The partial differentials with respect to u, and c, equated to zero give 
n SUCH 2 (2m) 20 
O71 oj 9; [21 
eG uu) =0, 
ar 


whence we find u, = 3 and o, =8 and the envelope is 
PECES * log (2)' + 5 
Lu n mode] Oo Oo o 
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The dotted curve in Fig. 26.2 shows one such envelope. It touches the boundaries of all 
the critical regions which have the same likelihood-ratio k. The space inside may be’ 
regarded as a “good " region of acceptance and the space outside accordingly as a good 
critical region. 

"There is no best region for all alternatives, but the regions determined by envelopes 
of likelihood-ratio regions effect a sort of compromise by picking out and amalgamating 
parts of critical regions which are best for individual alternatives. 


Example 26.5 


In the previous example we have supposed that the sample space W was the same for 
all admissible alternatives. This is quite legitimate, for we can always regard the domain 
of variation as infinite by supposing that p — 0 outside the range of the frequency-distri- 
bution of the variates. In the normal case, of course, p does not vanish anywhere, so that 
we are compelled to consider W as infinite. 

When, however, the sample-space for non-vanishing p is bounded, special circum- 
stances may arise, and it is occasionally necessary to consider separately the different 
discriminating regions. For instance, if the sample-spaces corresponding to H, and H, 
are W, and W,, it may happen that W, and W, have no common part when both p, and 
Px are greater than zero. If so, we can distinguish between H, and H, with certainty. 
If there is a common region W,, then W, — Wo; should be included in the best critical 
region, for to do so reduces the probability of errors of the first kind. But it does not follow 
that this should constitute the whole of the critical region, for we might then commit too 
many errors of the second kind, i.e. accept H, too often when H 1 is true. We may then 
wish to add to W, — Wo, a region wo, making w, altogether, such that wo, lies inside W, 
and p, (E ew) = po (E £ w) — 1 — «. This controls the first kind of error to level « 
and reduces the second kind of error. 

Consider the population 


1 


LE a—th <x «a +4b 


p (x) 


= 0, elsewhere. 


Suppose a sample of n to have been drawn from a population of this kind where b is known. 
We wish to test whether a has some value 4, as against the alternative a. 

The sample-spaces W, and W, are hypercubes centred at a, and a,. If they have 
a common part We, the probabilities p, and p, in that part are both proportional to the 
volume and p,/p, = 1 everywhere in the region. If, then, we take any region Wo of con- 
tent 1 — «in Wo, and add it to W, — W, we get a best critical region, and there are clearly 
infinitely many such. 

For the admissible alternatives a, the hypercube W, will move along the long diagonal 
Tı =, =... = v, 88 Q, varies, and we cannot always find a common region of size 1 — « 

1 


to form ww. By taking such a region as a hypercube of side b (1 — a)", however, fitted 
into one of the corners of W, lying on the long diagonal, we “ nearly ” obtain such an object 
since this region provides what is required so long as W, and W, have a common part of 
content 1 —«. Which corner we choose depends on whether the hypothesis is a, > ay 
Or d, > d. 
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Relation between U.M.P. Tests and Sufficient Estimators 

26.19. It was thought at one time that the existence of a set of U.M.P. tests for 
a continuous range of admissible alternatives involved the existence of a sufficient estimator 
for the parameter concerned. This does not appear to be true in full generality, but is 
so in nearly all the cases occurring in statistical praetice. We will prove a theorem on the 
subject :— 

If a system of U.M.P. tests exists and if any point in the sample-space lies on the 
boundary of a best critical region, then a sufficient estimator exists for the parameter whose 
variation provides the admissible alternatives.* 

It is enough to show that for an arbitrary point we have 


Pı (E) =h (t, 0) po (E) . . AE . (26.23) 


for then t is sufficient for 0 by definition. Now we know that on the boundary of a critical 
region we have 
Pr (E) 
Po (E) 
where h varies with the z's and with 0. We show that h has the form k (t, 0) by defining 
a function ¢ and showing that if t has the same value at any two points Z, and Z,, then 


mE) _ pi (Bs) 


Po (Ei) E Po (Eo) 


zich say, 


for all 0. 


26.20. For this purpose we require a lemma to the following effect : if a set of U.M.P. 
tests exists, it will be said to be ordered if the condition «, — «, implies that the critical 
region w (x;) is included in the region w (a); and if a set of U.M.P. tests exists but is not 
ordered we can always find another set which is. 

w (a) and w (x;) may include parts of W where p vanishes. Let the remaining parts 
be v («,) and v (x) and, if v, is the common part of these regions, write 

PAG) = Poa: p CERT APERTE 1) 
. v (uu) = vo +0 
where v, v' and v" have no common points. Now for any value of 0 and for any E in w («,) 
—and therefore in v'—there is an ^, such that 
Pı (E) > h, po (E) in v 
< h, Po (E) outside, and therefore in v”. 
Similarly, within w (x;) and hence within v” we have an h, such that 
gi (E) > hs po (E) in ue 
< h: po (E) in v’. 
It follows that, from the inequalities deriving from v”, hı > ka, and similarly, from v’, 
ha >h, Hence h, = h: = h, say, and 
eS PABY Eh p(B) 1: 9 5 te 535) 
within v' and v” for any 0. 

* The theorem remains true if there is a set of points of measure zero for which the condition as to 
boundaries is not fulfilled. It is also true for several parameters, as may be seen by an easy generali- 
sation of the argument. See Neyman and Pearson (1936a). 
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Now take 
u (a5) = vo + v^ STI eH s . (20.26) 
such that 
Í ur ET A e, LIN ED T, (89.27) 
ula) 


This is always possible, for the integral of py over v, + v” is 1 — a, which is greater than 
l— oe, It follows from (26.27) ard the first equation of (26.24) that 


NES =| z uo DES ip’, (26.28) 


Now put sie 
w (a) = Wo + u (a1) = Wo + vo +0”, 


where W, is the part of W for which p, — 0. Then from (20.27) 
f pode = 1 — a. 
w (a) 


Further, w' («,) is a best critical region with respect to admissible alternatives, for (26.25) 


and (26.28) imply that 
f pı dx = Í Pı dz, 
v v 


i pdx = Í pi da. 
w (a) v (a) 


Finally, w’ (x) is wholly included in w (æ). 
We have therefore replaced the region w (a) by another region w’ (xı) with the same 
- properties except that it is included in w (x;). The lemma follows. 


and hence 


26.21. To return now to the main proposition, let E be any point of W. If it belongs 
to only one boundary of a best critical region with content 1 — « we put (E) —1—a. 
If it belongs to more than one, we put (E) equal to the mean between the upper and lower 
bounds of values of 1 — « for which the boundaries include Z. In virtue of the lemma, 
this implies that whatever the value of 1 — « between these bounds, the corresponding 
boundary must contain Æ. 

Thus £ is defined everywhere. Further, if it has the same value at two points E, and 
E, these points must lie on the same boundary. It follows that on this boundary 


qi (1) Dr (E) 
Po(Ei Po (E) 


and hence the theorem is proved. 
The converse is not generally true, but one has to exercise some ingenuity and import 
some artificiality to construct examples where it fails. Cf. Exercises 26.3 and 26.4. 


Composite Hypotheses 


26.22. We shall consider a class 2 of admissible hypotheses depending on r + s 
parameters 0, . . . 0, . . . 0,,, and shall regard the hypothesis H, under test as one of 
this class. A composite hypothesis of r degrees of freedom is one for which s of the para- 
meters, say 6,,; . . . Opps are specified, the hypotheses determining the distribution 
apart from the unspecified parameters. For example, the hypothesis that a population 


pu 
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is normal with specified mean, nothing being supposed about the variance, is a composite 
hypothesis of one degree of freedom. It will be assumed that any admissible simple alter- 
native is given by specifying the other r parameters 0, . . . 6, and that there is a common 
sample-space W for all such alternatives. i 


Regions Similar to the Sample Space 


26.23. In order to test the composite hypothesis H, we need in the first place to 
control errors of the first kind by determining a critical region w, such that 


[pede =1—a. TO aap he eet S (20:20) 
w 


This, however, differs from the simple case in that p, can vary according to the unknown 
parameters, and to be certain of controlling the error we must be able to find w such that 
(26.29) is true whatever 0, . . . 6, If this can be done we shall call the region w similar 
to the sample-space W and shall speak of 1 — « as its size. 

The problem of testing composite hypotheses then becomes one of (a) finding the 
similar regions, and (b) selecting from among those regions the one which minimises the 
second kind of error for a simple admissible alternative H,. If this is the same for all 
H, we shall have a common best critical region. 


26.24. We consider in the first place the composite hypothesis with one degree of 
freedom. The general problem of finding similar regions in such a case has not been solved, 
but a solution is possible in one important class of case, namely, that for which 

(a) po is indefinitely differentiable with respect to 0, for almost all values of 0, 

(b) the function p, obeys the relation 

¢ =A+B¢4, . á E 2 : 5 . (26.30) 
where 24 
a 
BUS Cae T E T aeter 
$ = gj. l08 P» 20, (26.31) 
and A and B depend on 0, but not on the z's. In particular the normal distribution 


is of this type. Ay Ta 
Under conditions (a) and (b) it follows that for w to be similar to W it is necessary and 


sufficient that 


9* Do 
= 0, Lx bay Pape . . + (26.32 
aie, dde) [os 
Let w be a region for which (26.32) is true. Then for k = 1 and 2 we have 
NEL 
w 


f Po ($° + 4) dz = 0. 
In virtue of (26.30), this last may be written 
IT +4 + Bede = 0, 
w 


whence 
f p» gtde = -A| mods cA (T a) E (20:35 
w w 
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Differentiating (26.33) with respect to 0; and using previous results, we find 
| BD SUI eim. . (2034) 
and generally j 
| Po $€ de = (1 — a) Yr (O) . > : : . (26.35) 
w 


where y; (01) is a function of 0, only, and is therefore independent of w. Now (26.32) is 
true for W = w, and we find 


f pd*dy—yy(8) . +» + .  . (26.36) 
Ww 
so that 


=) pi $F dz =| TIE s . (gem 
1—«J, w 


Now consider the random variable ġ. Since p, integrated through w is equal to 1 — a, 


Pi asa frequency function defined in w. It follows from (26.37) that 
Loses 


we may regard 1 


the moments of ¢ in this domain are the same as those of 9 in W. Consequently, if the 
moments determine the distribution uniquely, the distributions of 4 are identical. 

Hence we may use the hypersurfaces ¢ = constant to set up similar regions. The 
space W may be imagined as composed of shells of infinite thinness bounded by these 
hypersurfaces. If we determine an “area ” on one of these shells equal to 1 — x times 
its area in W, the totality of such areas will constitute a region w of size 1 — x; and since 
this will be so irrespective of 0, the region w is similar to W. 


26.25. When similar regions are determined by the above method we have to find 
the best critical region from among them. Let H, be a simple admissible alternative. 
We require to find from the regions w a region w, such that 


f Pıdx = maximum. . . , j . (26.38) 
We 


We now show that this is equivalent to maximising 


| p, dw ($), Quies 3 ; . (26.39) 
w ($) 
subject to 


f Pod ($) = a —2)[ podW ($)  . $ . (26.40) 
w ($) W ($) 


Here w ($) means the element of w for constant $—the “ shell” of the previous section. 
The object of this is to reduce our present case to that of simple hypotheses. We take 
¢ as a new variable and consider together the remaining variables (which amounts to deter- 
mining similarity of w and W in each separate shell between $ and ¢ + d$, asin the previous 
section), and are thus left with regions dependent on $. Equation (26.39) then requires 
that the probability of the second kind of error in each shell must be a minimum, subject 
to the control of the first kind asserted by (26.40). 
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Suppose that (26.39) were not maximised. There would then exist a set of values of 
¢ for each of which we could determine a region v (4) such that 


J, zea ($) = (1 =o | pear ($) cote S (20141) 


and 


J 2r 0> won uP d G2. z : E . (26.42) 


Let E be this set of values of ¢ and CE the remaining set. We prove our result by obtain- 
ing a contradiction, namely by defining a region v which is similar to W, and such that 


f nde | pide, ALT : 1 5 . (26.43) 
v We 


which contradicts (26.38). 
Take as v the shells of hypersurfaces (1) in CE which are identical with wo (9) and 
(2) in E which satisfy (26.42). Now 


[ode - f, |, neto 


v ($) 


and i nds = f as) pi dw, (4). 
We E+0E We ($) 


Hence 


fnd- f nis foont nmm — fonde o) 
=f {f| nem ones) . o eos 


which is the contradiction required. 


26.26. Thus our problem is reduced to that of finding, in the shells 4 = constant, 
portions we (4) which maximise the integral of p. We have, so to speak, brought the 
problem down one dimension by locating it in shells instead of dealing with it throughout 
the spaces w and W. It now becomes that of a simple hypothesis in (n — 1) dimensions, 
and the best critical region is the one for which 


eit 
PPE Py os | o ee fue (28:48) 


where k is a function of ¢. The sum of these regions for the various values of ¢ gives us 
the complete solution to the problem, and if this sum has boundaries which are independent 
of H, we have a common best critical region and a U.M.P. test. 


Example 26.6: “ Student's" Hypothesis 
A single sample is taken from a normal population 


dF T exp { We z2) dz, 


ev 


with unspecified c. We have then one degree of freedom, 0, = c, and the hypothesis H, 
is that u = Mo, say. 
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We find 


a _ E = Mo)? 
EAT at og 
09 (^ FEM @ — h) 
Oc  c* ot 

— 2n 3$ 

= SP te 

n 3n p< " " 
verit ((€ — fo)? + 8}. 


Condition (26.30) is satisfied, and ¢ is constant over the hypersurfaces 


Z (x — uo)? = n {(F — pu)? + s?) = constant. 


The hypersurfaces are hyperspheres in W. To construct a similar region we have merely 
to pick out a region of size 1 — « on each shell and to amalgamate them. In our present 
case this is particularly easy because p, is constant over the shells and we need only pick 


out areas on each shell bearing to the area of the hypersphere the ratio 1 — «. 

: These areas need not be of the same shape or similarly situated. By selecting them 
in different ways an infinite variety of regions may be constructed. We have to find the 
best for an alternative simple hypothesis y = op 4 = Jt. 

The condition (26.45) becomes 
1 n $ 1 n = 
aoe es 25i {€ — m) 4 s) ] 2g OxP | — 3gi {(@ — mo)? + *)]. 
As we are dealing with regions which are similar with regard to c, we may put o = c; 
and find 


A A 1 
& (ty — fo) > $ (u — u) — = o} log k = (p, — uo) kı, say, 


where k, = k, ($). Thus we find, for the boundary of we (4), 


if m> Ho z > k, (¢) 
if i, < uo & «Hh ($), 


where k, has to be chosen so as to satisfy 


le "LL dw ($) = (1 — a) En po dW (4). 


Thus on any particular shell the “ cap ” cut off by the hyperplane # = constant must have 
area 1 — « and hence must subtend the same solid angle at the origin. Consequently the 
boundaries lie on a right hypercireular cone through the point whose co-ordinates are all 
equal to x, and whose axis is perpendicular to ë = 0, namely thelinez, =2, =... =2,. 
For each æ there will be a different cone. If u> m, the cones will be in the posi- 
* tive quadrant and in the contrary case in the negative quadrant. 

Furthermore, these regions are independent of u, Thus for the class of hypothesis 
Hi > Ho OF Hı < po (but not both together) the common best critical regions and U.M.P. 
tests exist. ` 

Finally we have to evaluate « in terms of the sample values determining the critical 


ed 
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cones. We have already seen in Example 10.6 (vol. I, p. 239) that if z =~ — ^ the 
frequency inside the cone is : 


1 f dz 
LI. 
T — 1 e 
a ae 3) CEEI 
Thus * Student's" test, which we have previously considered on more or less intuitive 
grounds, is now seen to be the best in the sense of the theory herein developed, for the 
admissible class 4, > 4 or for that fa < ho- 


Example 26.7 


Consider a sample from the normal population with unspecified mean, the hypothesis 
being that o =o). We now find 


n (E — u) 
LICHT Tnm res 
ARRA 
DEUM 


so that (26.30) is satisfied. 
The hypersurfaces $ — constant are the hyperplanes Z — constant, and any regions 


of size 1 — « on these hyperplanes will provide similar regions w. "The condition Pi > i Po 
will be found to reduce to 


st (oh — of) < — (E — p)” (o8 — of) + Boh ot log + Llog E} = (of — of) bn say. 
t 


If 0,27 0, we have s* >k, (¢) 
and if O<, we have s? <k, (4). 


Since s? is independent of #, k; will be a function of x and n only. The best critical 
regions are those given by s? > sj and s? <s? as the case may be, and the appropriate ~ 
values of s, corresponding to « may be found from the known distribution of s?. The 
critical regions are hypercylinders, and again there are two sets of best common critical 
regions, according as o> c, or 6, < Oo. 


Composite Hypotheses: Several Degrees of Freedom 

26.27. As a preliminary to extending the theory for one degree of freedom to the 
case of several degrees, we note that if a region w is similar to W with regard to 0, . . . 0, 
jointly, then it is so for each of them separately ; and conversely. The direct result is 


. obvious and the converse follows in this way: (we need prove it only for r = 2 because 


the rest follows step by step). If then 
Í pdz =1—& 
w 


is true for 0,, 0, . . . 0, independently of 0,, and for 0,, 0, . . . 0, independently of @,, 
then it is true for any values of 0, and 0, and any other fixed values of 0, . . . 0,; and 


hence it is true independently of 0, and 6, together. 
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26.28. An additional preliminary requirement is the concept of independence of 
a family of surfaces of a parameter. Suppose 


filers == %qs 2 =O, (pale ooh RMENENL (0646) 
represents a family of surfaces, where 0 and the C’s are variable parameters. Let 
S (0, C, . . . Cp) be the intersection of these surfaces, or, if k = 1, the surfaces themselves. 


Consider the family obtained by fixing 9 and allowing the C’s to vary. Then if any surface 
of this family for 0, can also be obtained from a second family for 0, we shall say that the 
family is independent of 0. We get the same aggregate of intersections however 0 is chosen. 
For example, if 

fi = (y — 0) + (£a — 9)? + (ws — 0)? = C, 
and fax bx + ts = Cy, 
the family S consists of circles in planes at right angles to the line x, = x, = x, and having 
their centres on that line. This is true however 0 is chosen, and S is therefore 
independent of 0. 


26.29. Under certain restrictive conditions similar to those of 26.24 it is now possible 
to find solutions to the problem of determining best critical regions. We assume 


(1) that E exists almost everywhere for all k andj —1...7; 
J 


3 9 p 
(2) that if à; = 30, 8 Po and 4;— ab 


then $4; =A, + Bj gj; : i : A ; . (26.47) 


(3) that the family of surfaces given by the intersections of 4; = C; is independent of 
6, forj—1...r. 


Subject to these conditions (which are sufficient but not necessary) similar regions exist. 
Consider any two surfaces ġ, and ¢,. Since w is similar with respect to 0, alone, we may 
find surfaces ¢, = constant and 


f pdw(g) = | paw (g). ; z . (26.48) 
w $4) W (à) 


In accordance with assumption (3), the family of surfaces d, = C, is independent of 0,. 
Thus if 0, varies, W (¢,) and w (¢,) will not vary, though perhaps they may correspond to 
other values of C,. Furthermore, (26.48) is true regardless of 0,. Hence within the shell 
d; = constant we can repeat the analysis used for one degree of freedom. We find that 
the necessary and sufficient condition for w to be similar to W with regard to both 0, and 0, is 


le rage aon TA ls Pe BI Ads D MR) 


where W is the intersection of ¢, = C,, ġa = C, for any values of C, and C, ; and similarly 
for w. 

As before, the most general region w is obtained by amalgamating the portions of size 
(1 — «) on the intersections of ¢, and ¢,. The generalisation to r degrees of freedom is 


r 
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immediate. It also follows in the usual way that the best critical region is the one for 
which 


Í Pı dx >f p, da, 
Wy v 
v being any other region of size 1 — 4; and Wo is defined by 
2:»h(0...0)p. . . . $ . (26.50) 
The following examples will illustrate the theory. 


Example 26.8. Ratio of Two Variances 
Suppose we have two samples of n,, n, members from independent normal populations 
whose means and variances are unknown. The joint distribution may be expressed as 


(£y — pa} + 4 


1 n = d: 
TT I JR (Gs — ms)? at) = gd 


or ons 
Consider the composite hypothesis o, = o, =o, say. This has three degrees of freedom, 
for “44, H, and c are unspecified. As the alternative H, we will take 
Oa 


I, = fay 0, = fa — pı = by, 0, = 01, 0,— 2, 
Oo; 


and for H, itself 
0, — u, 0, — b, 0, — 0, 0, — 1. 


We have first to consider whether the conditions of 26.29 are satisfied. 
(1) Evidently p, is differentiable for all parameters any number of times, 


(2) We find— 
$1 = & log po -Ri (E — u) + n (a — u — b)} 
os = 2, log ps = Se st 1b) 


bs = log py = — LM) 1 pg, ip on (By i — Of + mt + mol} 
o 


and (26.47) is seen to be satisfied. 
(3) The hypersurfaces ¢, = C, are evidently equivalent to 
nav, + Nel, = Cy 
where C, is an arbitrary parameter. The hypersurfaces ¢, = C, give similarly 
Za = p AE 
Both these are independent of 0, and their intersections, namely 7, = constant, £, = con- 
stant, are independent of 0,. Thus the third condition is fulfilled and we may apply the 


foregoing theory. , 
The F [^ = constant, $, = constant, $, = constant are equivalent to 


č, = constant 
i, = constant A 
1,8? + nas3 = constant = (n, + mi) sœ say. 


A.S.—VOL. II. 
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The element w is part of W (d; $a d) within which 
Pi > Po/h (Ey, s, Sa) 
and this condition, by reference to the frequency function, becomes 


1 A 1 5 2 2 
RE exp | ae {m (2, — u)? +m si) g5i Us @, —u—0)* 4 met) 


1 = 2 -2 (z , -2 ge 
— Ane [ gj; Un (Z1 — fa)? + 1 8} + 297 (Ea — uy + Me)? + 46; &). 


Since the region w is independent of u, b and c, we may put them respectively equal to 
Ux b, and e, and hence find for the condition 


na (1 — 62) ((E& — wa — 51) + 83} < 201 0? (log h — nz log 0,). 
Since this inequality holds good on #, = constant it contains only one variable s; and we 
accordingly find two cases :— 


T5610, — oo 1 the best region is defined by sj > hj (Z;, a 82); 


1£0,— — m 1 the best region is defined by sj < h, (Za a 5). 
We have now M acta hy so as to satisfy 
| pode = (1 ~ 2) | Po 
Wo (fis ba» bs) Wo (brs da» $s) 


Now W (di, $a, $5) is the locus for which #,, @, and s? are constant, and thus the integral 
on the right is the product of 1 — « and the frequency function p, (%,, Ža, 52). Similarly 
that on the left is the integral of this function over the region for which s3 < h’. Thus 


` 


"tU 
f. Po dx -f. : Po (91, a, 52, 83) ds? in the first case, 
w 


with a similar expression but ius limits in the second. Now we have for the joint 
frequency function of Z,, a, s? and sj 


1 
fo aa 813-3 8-3 exp l-z z {6 (1 — pa)? Fna (Za — Me)? + (n4 + na) 8? i}: 
Transforming from s? to s? as HO, we find for the condition, after a eee reduction— 
p EN k M 
f { (m1 Hna) sa — na 83} ? 83 dsp — (1 — a) f { (m +m) sè — na s8} E san3 deg, 
7 0 


y ^nm. posa 2 2 
where h” = DTE sa On substituting n, s3 = (m, +n.) s2 u we find— 
us m-3 m-8 1 maS ms 
f a S Em =Í (aay aoa ar a = aB (™ LM 2 
0 Ue 


AFS 
Tt folos that w, w, depend only on a, n, and n, Thus, whatever the values of 4, Z, 
and &, the best critical region is defined by 


2 
>h = (n Tonos Up 


Ne if o> o, 
Hoe EMI a oy 
2 


2 


eo 
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These are equivalent to 


4a ey, if o, 
Mm Stns ° Sue ay 
us <w if 0, > oa 
If we put 
z = ł log m (mac T5 


Nz (n, — 1) 83 i 
the B-distribution of u reduces to Fisher’s form. The result we have reached is therefore 
equivalent to showing that the z-test is the best for the ratio of two variances in normal 
samples. As usual, there is no U.M.P. test for the whole range of the ratio from 0 to co, 
but two U.M.P. tests for the ranges 0 to 1 and 1 to oo respectively. 


Example 26.9. Difference of Two Means 

Consider again the previous example, where now the variances are unspecified but 
equal and the means y, and jj, = u, + b may have any values. The hypothesis H, is that 
b = 0 and has two degrees of freedom corresponding to u and o. 

Let the alternative H, specify the parameters 

9, = My 9, = oy 0, = by. 

In addition to the quantities required in the previous Example we now use also % and 
sj, the mean and variance of the pooled samples. 

We find that the three conditions of 26.29 are satisfied, and 


NEL 


$i gi (£y — ui) 
gam TE PE (o pa)? + ap. 
Equivalent to this family are the surfaces 
x= C, 
8 = C;,, 


The condition p, > h (u ¢2) Po reduces to 
b, (£y — a) <h’ (Ev, 8p)» 
and as usual we find two cases according as 4; > y, or vice-versa. We consider only the 


first, the second being analogous. a 
Writing v = 3, — 3, > k, we have to determine h’ by 


hi” piv A 
[ne dv= A —a) | n, ad 
a 
where h’”’ and }* are the lower and upper limits of the variation of v for fixed values of x, 
and sj. : 

The frequency function of a, s), v and sj is easily found to be 


n-3 
mte uli ep [BE Gd] 


Ny + "s 

whence that of že sj and v is found to be 
m+n.—4 "oq + Ny se ^ 2 

fe (s = EIE s) D ex|- CIS { (zo — n)? + $i 


E 2 
f e s" 4 (nı + Mz) 8) — nı si — 
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Since Z, and sj are constant over the domains under consideration we have to satisfy 


tnt nie nın m+n 4 " 
accept w= -af (a-z eyes dv 


m (m + ma)? (m + Ms)" 


_ (mr + Ma) Bo pe (m + Ma) so 


M (ni mi) å V (ni Ne) 


where k” = 


_ If we put 
(m+n) zZ 


vun) (142 


this reduces to 


1 ^ dz 
1 2 f e eas Pie l—« 
z(y" +N, — ) o (1+2) 


2 2 
and uer E rte a Mu 
V (my 8} + na 83) Ny + Ng 4 


We have thus arrived at the 7-test for the difference of two means in normal variation when 
variances are equal. Once again the test we introduced on more or less intuitive grounds 
has been shown to be justified in the light of the theory developed in this chapter. 


Linear Hypotheses in Normal Variation 

26.30. Several of the hypotheses dealt with in foregoing examples are particular 
cases of a general class known as linear hypotheses, which accounts for the fact that we 
keep arriving at the same sort of conclusions respecting them. 

Suppose we have n independent variates typified by æ; distributed in the normal form 


1 1 
uen [es ue Tas 


with common variance c? but different means. Suppose the means. are connected with 
r and s unknown parameters 0, . . . 0, . . . 0,,, by linear equations of the type 


UND Cl = alegre, NEUE C o — 7 (6,51) 
i 
Suppose further that the hypothesis H, specifies r parameters 
0,— B, ...0,— B, 


and hence is composite with s degrees of freedom. Then H, will be called a “linear 
hypothesis". The reader can verify for himself that "Student's " hypothesis, and the 
hypothesis as to the difference of two means when variances are equal, are of this type. 
The homogeneity test in variance-analysis and the test of regression coefficients are also 
reducible to the same form. If, of course, H, specifies r linear relations among the 0's 
instead of the 6’s themselves, it can be reduced to a hypothesis which specifies the 6’s 
directly, except perhaps in degenerate cases which need not detain us. 


26.31. The theory developed in the earlier part of the chapter for composite 
hypotheses may be applied to linear hypotheses as we have defined them, and the argument 
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follows exactly that of Examples 26.8 and 26.9. Tt is readily verified that the three con- 
ditions of 26.29 are satisfied. We have— 


moron EAR 
NOI we p = E TD OY 


4; = constant 
n 1 
p= — 5 AEn — ug) 
J le ee 20708) 
_ _2n_ 3 
Pata Sa Po 


We can therefore find similar regions w (f, . . . $, ¢,) and select from them the best 
critical regions in the usual manner. We will omit the rather cumbrous algebra and quote 
the following result (Kolodzieczyk, 1935). 

Transform to new variates E, . . . Ei, Yr+s+1 + + + y, by the equation 

rts n 


zy = Me + Dre By + konyn cet a euere 
j=l 


j=r+8+1 
where the c's are those given in (26.51) for j, k <r + s and the other c's are orthogonal, i.e. 
k 


Dy HK hm rk o, (26.55) 
i=l L * 
xx]; k=j, j>r+s 
Define 
nS? = yj 1 f 4 A j . (26.56) 


j=r+s+1 


n rts 2 
and nS? 23103 ens) e erg es (S007) 
k=1 \j=l 


A further transformation of E,,, . . - E,,, is now made to variables y,,, . . . y,4, 80 
that (26.57) becomes 


La r+s 
nS? = 27. Ry E; Ep + D E A E, an 2088) 
j, k=1 k=r+1 
TS 
= nS? + De VLO Be AL ey HT E PESO) 
k=r+1 ' 
The coefficients R can, of course, be obtained from the e's by ordinary determinantal 


algebra. j 
3 Writing now e; = 6; — 0j, i.e. the difference between 0; on the alternative hypothesis 


and its value if H, is true, we find that the best critical region is given by 


Bjr 6j; Ey 
jr Son 2. —. (98:80) 


1 
o= VS: + 083) sif >» A 3 


j,k=1 


294 GENERAL THEORY OF SIGNIFICANCE-TESTS 
where v is distributed in the form 
dF œ (1— e) Td LXI pects) 5.7 1 (90:61) 
and v, is given by 
1—&- f dF. EE S e. . (262) 


26.32. There is one interesting conclusion to be drawn from (26.60). If a U.M.P. 
test exists, v should be independent of 0; and hence of ej This appears to be possible 
only if the denominator in the second part of (26.60) is rational. But this denominator 
is seen from (26.59) to have the coefficients of a positive definite form and hence is only 
rational ifr — 1. We conclude that if r > 2 no U.M.P. test is possible for linear hypotheses 
in normal variation. 

We have already seen that under general conditions no U.M.P. test exists for r — 1. 
A similar conclusion follows from (26.60) if r — 1, for it then becomes 

Fue By r 26.62 
Vala per ra HBAR 


which, as usual, leads to two cases according as £, Z 0. 


y, 
26.33. We will pause at this point to review our results. We began by defining two 
kinds of error and showing that a test could be defined as “ best " for a single alternative 
hypothesis if it controlled the first kind and reduced the second to a minimum. When 
there is a class of admissible alternatives we may sometimes arrive at a U.M.P. test which 
will minimise errors of the second kind for any member of the class, and such a test ay 
be regarded as the best attainable. Though the U.M.P. test does not exist in the Beat 
majority of cases, we may find tests which are U.M.P. for either 0, > 0, or 0, < 0,. Buch 
tests have been reached for * Student’s " hypothesis and several others in common use, 
and are found to give the same tests as those introduced on rather intuitive grounds in 
Chapter 21. 


26.34. The absence of a U.M.P. test implies that in the majority of cases we have 
to look for other criteria to provide “ best" tests. In the remainder of this chapter and 
in the next we shall consider several lines of approach which have been developed :— 

(a) Relying on 26.18 we may evolve tests based on the likelihood ratio. These will 
give U.M.P. tests if such exist, and in the contrary case will do their best, so to speak, by 
finding the greatest common denominator among the best critical regions, 

(b) We may consider the properties of tests when the sample number n tends to infinity, 
and so obtain tests which are U.M.P. in the limit. Such tests, like maximum likelihood 
estimators, may be employed on the grounds that they are “best ” for large n and 
presumably good for small n. 

(c) We may derive a new criterion from the concept of bias in statistical tests, which 
will be explained in the next chapter. 

(d) Recognizing that there is no test which is U.M.P: everywhere, we may seek for 
one which is U.M.P. in the neighbourhood of the true value. The idea behind this approach 
is that it will be more important to detect errors in the neighbourhood of the true value, 


` 
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and that large errors may be left to look after themselves, either because they are infrequent 
or because almost any “ reasonable ” test will reveal them.* 

(e) When a number of independent parameters are involved, we may abandon the 
attempt to test for each separately and confine our attention to the class of hypotheses for 
which they are functionally related, e.g. by y =f (0; . . . 9,). This reduces our problem 
to the case of a single parameter y, and we may be able to show that a particular y, is the 
best in the sense that it is U.M.P. with respect to all other y's, that is, to all other tests 
depending on the single function of the unknown parameters. 

We proceed to consider these approaches. 


Tests Based on Likelihood 


26.35. Suppose that for a given member of a composite hypothesis H, the joint 
sampling distribution of the variables x, . . . % has a frequency function p, (which is, 
of course, the likelihood). Considering the 2s as fixed, we may examine the variation of 
Po according to variation in the unspecified parameters 0, . . . 0, which form a set, say 
c. Let p, (w max.) be the maximum value of p, for such variation. Similarly, if Q is 
the class of admissible alternatives H,, let p, (2 max.) be the maximum of the likelihood 


for variations of all the parameters 0, . . . 0,,,. Write 
EUNDI) 26.64 
er Re xU ru (26.64) 
Then a possible criterion for accepting H, is to take as critical regions those points for which 
A < constant = C, say, . ; n $ . (26.65) 


where C is determined by relation to a probability level « from the sampling distribution 
of 4, which of course is independent of the unknown parameters. In defining 2 we have 
assumed that the maxima on the right of (26.64) exist, but we can give the equation greater 
generality by taking p, (w max.) as the upper bound of values of p, in the set w where no 
maximum exists; and so for 2. 

Tn this form the criterion states that we are to accept H, if the maximum likelihood 
in the set of permissible H,'s is greater than a specified proportion of that in the set of 
alternatives H,. In doing so we control the first kind of error in the ordinary way. So 
far as concerns the second kind of error we saw in 26.18 that for H, simple the criterion 
provided a sort of highest common factor among available tests ; and presumably qualities 
of this kind will be equally useful when H, is composite. 


The Problem of k Samples 

26.36. We will illustrate the theory of the likelihood tests by discussing a problem 
of considerable practical importance. Suppose we have a sample from each of 5 normal 
populations, æ; being the jth member of the ith sample. Let 

n; be the number in the ith sample ; À 

N = X (n) be the total number of observations ; 

4, be the mean of the ith sample ; 

s? be the variance of the ith sample. 

i n errors of the second kind for larger deviations, 
Ha oe Sie ae gan sala ones. I understand from De, B. L. Welch 
hortly before the war ; the results did not differ very materially from 
near the true value in the case he examined, and the 


* An alternative 
on the ground that larg 
that he considered this approach s i 
those given by requiring optimum properties 
results were not published. 


. LI P 
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We will eonsider three different hypotheses H,:— 


4 (1) H, that all populations are the same and hence have the same unspecified mean and s] 
unspecified variance. | 

(2) Hi, that they have the same variance but different unspecified means p, . . . y. ` 
(3) H., when it is known that they have the same variance, that they have the same means. l 


We have for the joint Sas cl | 


1 d 2 2 
p n; BIN + si 
(22); lia apd- PX 


Consider first of all H. We a for p (2 max.), 


nce MEC NER 3 . (26.66) j 
8; = Oj 5 í i . (26.67) i 
and for p (w max.), putting all the us and o’s equal and Poking the first partials of log p, 
to zero, | 
k 
- 1 = Ayre 
Hi = čo = N Dm di . . D . X . (26.68) E 
= 
1 
of = 6 = W Dy Mi UF — Be)? ru. : s . (26.69) 
i=l 
Inserting these values in p we find, after a little reduction, | } 
k fs 
Ag = I . : ; : . (26.7 
MAP ag 
Similarly it may be shown that | 
kf g2\% 
Ag, = i (3s. D do seen de (28.71) 
i=1 \ 8a 
IŠ 
where du O a (88:78) 
and also that x , 
qM D 
fh -(3. EE ed sur s (26:18) | 
8 I 


It will be noticed that 4j = Ay, dy, 


26.37. The function 4j, may be related to the correlation ratio ņn?. We have 


lec m 

tyi GS s. . s. (6.74) 

and hence 5 1 
z | 

Au, e Gahe : 

E i aera fe (20.75) | 


The distribution of 4y, is thus obtainable directly from the pe form for 7? in samples 


from an uncorrelated population. 
ae 


>w 
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We also find : 
(S = aT (s?)” A 4 ; : x . (26.76) 


as = ag D" oon m QUT 


2 
The distribution of (A5.)N is that of 1 — n°, Veo the distribution of 7? is 


WF ght 0 a . (26.78) 


It can accordingly be tested in this distribution or the related z-form. This is, in fact, 
the criterion used in the analysis of variance for homogeneity tests, and it is interesting to 
remark that the z-test here arises in considering the hypothesis that the various distributions 
parent to the sample values, being already known to have the same variance, have the 
same mean, The other form of hypothesis, H, is that the samples come from the same 
population, and the equality of variance is not part of the data but part of the hypothesis. 
We are not then surprised, or should not be so, to find’ that the 2, criterion leads to a 
different test. 


26.38. The moments of the distribution of 4j may be obtained as follows. The 
joint distribution of z, and s; is 


Uie LG exp [- zi IG n9] [I de, Ids. . (26.79) 


2g? 


The distribution of means is independent of that of variances and can be ignored. 
Further, if 
1 = = 
r= si Zn; (ži — Xo)? 


then y? is also independent of the variances, and we have 
2 
dF œ II (s; -? exp ( — zi) x> exp (— $x?) I dsi dy. . . (26.80) 


Put now 
f et EE rect, SOLA) 


and note that 
oye = Ns — Ens 
= N8 (1 — Z yi). Ets o0 
Transforming to variables ut ay So, We on . 
Ns 
dF o II VOS (1 Ed 3 SUA SN sorp ( — 3) dag, 


whence, for the distribution of the y’s, 


i-3 k-3 
dF c My, * (1 —Zy,) * Ody, . (26.83) 
N y;\% 
Now lu = (2) B . " ‘ . (26.84) 


and hence we may find the moments of 4, by integrating its powers over the distribution 


SP 
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(26.83). Integrals of this kind, known as Dirichlet's, are expressible in terms of oe 
functions and we find, for the pth moment of 4, about zero, 


A ci (p --1)m —1 
wer(2 E ) Pt T ) 


ue curn np (+) 


(26.85) 
2 
When all the ms are equal this reduces to 


| r(*32) 
men fr E : . (26.86) 
3 eat yh 
26.39. For the criterion 2y, we start from the distribution 
dF oc II s? exp { ud Z (n; $;) ) II ds} 
20? 


Hip (Az) = 


and on putting 
DES 


2 
Ns? 


k-1 
Ny S -x&(i 33 o E Wess ote 
1 


we find, in much the same way as before, 


E k—1 n- 
raae e a er ( -Àu) EU V dap) 
1 


N kzi n k=1 
hn - [ (1 -Au) z (Za) + e» (26.90) 


NET Ty) r (p +1) —1 


a Pl Aae E Pc MESS 


Further, 


whence we find 


HOS I 2 
put SENSE ERIT) «0. (26.91) 
2 k 2 


26.40. For la in vi irli nsi 
noi. tge n; we find, in virtue of the Stirling approximation to the gamma 


(1) for 4, : M Me ge 
ME n per 
(2) for du, My —> — 
(p +1) 2 

(3) for ay, me 


k-1 
(p--1ys 


SH 
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These limiting forms are the moments of the distributions— 


a) (= log z): 
T(k—1) 

lor a= 

(2) and (3) E 
Ge) 


Hence, by the transformation v = e-!* we see that approximately Aj is distributed as 
4^ with y = 2k—2, and dy, and dy, as y? with v = k—1. 


26.41. For small samples Neyman and Pearson have suggested approximating to 
2 8 


the distributions of Ag and Aj,N by identifying their lower moments with those of the 


form 

dF oc enai — gym, 
This possibility has been examined in detail by Nayer (1936) for the hypothesis H, when 
all the n’s are equal. The distribution of 45 has also been studied by Wilks and Thompson 
(19372). 


26.42. Modified forms of the above tests have been considered by various authors. 
We may write 
2 
log Ay, =} E n; log“, cp LN yt 20027 
a 


where, of course, 
1 2 
=> J Ni Sie 


In short, s? is a weighted mean of the 5? and (2a) is a weighted geometric mean. Bartlett 
(1937c) has proposed using the degrees of freedom »; (= n; — 1) instead of n; in these 
equations, that is to say, defines a criterion 


poner | eae (20.93) 

"eur UN ALLE 

This test is, in the sense defined in the next chapter, unbiassed, whereas that based on 

21 

Ay, is not. Bartlett also suggested as an approximation that — —Rh could be regarded 

as distributed as y? with k — 1 degrees of freedom, c being given by 
1 n 1 
= (ee Š : . (26.94 
e gon (a) sl cd 


"This has recently been reconsidered by Hartley (1940), who showed that it is not very exaot 
for large and gave a better approximation which can be reduced to tabular form. Cf. 


Exercise 27.2. 
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Likelihood Criteria for the Linear Hypothesis 
26.43. We now proceed to consider the application of the likelihood criterion to the 
class of linear hypothesis as defined in 26.30. We have, for the likelihood function, 


low 1 ip 
p= (z) exp { — aga (% — fy)? \. £ . (26.95) 
Writing S? = X (a, — s)? we have, for the stationary values of p, with respect to o and 


the parameters 0 (related to the ws by (26.51) ), 


ð n , 8? Ph 
ca e + A= ; o 3 à . (26.96) 
95; log p, —— XC — py) Cie = 0. E. uU: . (26.97) 


This last equation is clearly the one we should get if we were seeking to minimise S? itself 
for variations in the 6’s. Let nS? be this minimum value. We shall then have, from 


(26.96), 


f@=S . : i 4 : . (26.98) 
The maximum of p in the class 2 of admissible hypotheses is then 
1 RR 
p (Q max.) - (zm) 22: ENS 7 (96.99) 


Similarly the maximum of p in the class o for which 0, . . . 0, are fixed and the other 
8 Ü's vary, is found to be 


1 "EUM 
w max.) =| —ai——— à un A : . (26.100 
p (oma) = (Tarra ve) * v 
where n (S; + Sj) is the minimum of S? under the conditions that 0, . . . 0, are fixed. 
Thus we find for the likelihood ratio A 
2 
DEN :-. '. Q0) 


S; 
Td 
(1+3) 
or, if more convenient, we may use the function 
— Ss 
a 
to provide a criterion. 
Now we make the transformation (26.54) and show that the values S, and S, as we 
have defined them here have, in fact, the values given by (26.56) and (26. 59). We have, 


from (26.54), 
n rts n 2 
mo t a -È| Se E, + p Cik u} 
kel 


j=l j=r+s+1 


= 2 (Z eg E)? + 23 (2 eg y)? 
-È Eo E)? + >} jj 


j=r+s+1 
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Since n Sj is the minimum of 5? for all variations of the 6’s and Æ and y are independent 
of the 0's, we must have 
nS; = E yj. 
Also, since nS; is the minimum of S? when the values 6, . . . 0, are fixed, it is seen to have 
the value given in (26.59). 
We have also . 
S? = nS? aS. 


n ris 2 
where n8) = ( Cj 5) 5 
and the frequency function of E's and y's is given by 
JEn +++ Brow dee «+ + ta) exp [mg (SEHD). . quao 
Now nS? is the sum of squares of n — r — s normal variates, and hence 
f (8) « Sr exp ( - $3), & C. c. (20104) 
Hence, since the E's are independent of the y’s, and since 57 depends only on the y's, 
f (Sor E... Eug) ec S277 exp { - gs (SES) \. . (26.105) 


We have seen, in effect, that n S} is the minimum value of Sj. It depends on E, . . . Æ, 
and hence is independent of S; and is distributed as 


. (20.102) 


T nS} 
J (Sp) «S; ‘exp ( - 3 
Thus we have 
f (Sa Sa) c 82-7771 Sp! exp {- EET }. 200. (20.100) 


Putting now Z = $,/S,, we find 
n= 
f(Z) co -ia py € +» + e o (26.107) 
which may be reduced to Fisher's form by putting 
SP (n —r — 8) n-r—8 
z = } log —-—_., — = log Z + $ log —. 4 . (26,108) 
rS r 


We have thus reduced the test of the linear hypothesis to the z-test and it is seen that 
several of the tests introduced in Chapter 21 can be justified on the likelihood criterion. 
These include the “ Student " test for one mean, the extended form for the difference of 
two means, and the test for the ratio of variances. Certain other tests in which the 
z-distribution (which, of course, reduces to the f-distribution for v, = 1) appears—such as 
that of the correlation ratio, the multiple correlation coefficient and regression coefficients 
—also depend on the linear hypotheses, and in the light of the theory here presented are 
seen to be different aspects of the same thing, at least so far as the testing of hypotheses 


is concerned. 
26.44. We will indicate briefly, without going into the complicated mathematics 


involved, some interesting results obtained by P. C. Tang (1938) and P. L. Hsu (19415) con- 
cerning the power of the z-test as applied to linear hypotheses. 
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The functions S2 and S7, as we have seen, are distributed independently in the 
x*-form, and their ratio accordingly in Fisher's form. From this viewpoint the test of 
the linear hypothesis is a generalisation of the test of homogeneity in the analysis of 
variance. Tang considers the distribution of 


S; 2 
Gem Bis =1—An : s $ . (26.109) 
and the variation for errors of the second kind, namely, when the values 0, . . . 0, are 


different from the specified values. He shows that the power of the test depends, not on 
individual alternative values, but on a single function of the 6’s. He also obtains the 
power function and tabulates it. 

Hsu then considers other possible tests which are based on this single function and 
shows that in this class of test the z-test or the equivalent E?-test is the uniformly most 
powerful. 


26.45. For large samples, when maximum likelihood estimators of the parameters 
exist, the distribution of — 21oz4 is that of y? with s degrees of freedom. For the 
distribution may then be written (see 17.46)— 


dF = A exp is EPIS (0, — 6) (0, = 6,) \ deme odo... 
so that p(Qmax.)=A. . T d : . (26.110) 


If 0, .. . 0, are fixed the likelihood becomes 


n . Mes. 2 
p =Aexp{— 3 7 Vik zi ek — ii 


r 

where g= 23i gn (6; — 9;) (0, —0) — . ; : . (26.111) 

j k=l 
and z; is given by 0; — 6; — L; where L; is a linear function of the r specified parameters. 
Thus— . 

p(o max.) = A,e-95, , £ : s . (26.112) 
where A, is the value of A when 6; takes its true value O. Thus, when H, is true, 

A=, , b : 3 . (26.113) 

But the characteristic function of 72 (= — 2 log 4) is 


rts 


J poet db, .. . dô 


Pert E 3 a ^ 
=A [exp {— 5246 + (i — 9} a0, aay T ANS 
1 


o = = ioc gs X PME ee = |} (98.714) 
(1 — ains 
This is the characteristic function of a quantity distributed as 7? witl 
: h s degrees of free- 
dom, and hence the result follows. Á POAST 


26.46. In concluding this chapter we may mention briefly a question which fre- 
quently presents itself when statistical hypotheses are being tested in practice. Our tests 
are based on the observed values obtained in the sampling process, and in order to apply 


NOTES AND REFERENCES 303 


them we require no prior knowledge of the parameters to which they relate. They can 
be used in a state of complete ignorance about the parameters. But suppose some informa- 
tion is already available; or suppose that we attach varying degrees of importance to the 
avoidance of particular types of error, How far are the tests developed in this chapter to 
be modified ? 


26.47. Consider, for example, the situation which has already been mentioned in 
connection with the theory of estimation, of the chemist who is assaying the strength of 
a particular drug. If the drug has harmful effects in large quantities it may be much more 
important for him to detect cases in which the true strength exceeds his hypothetical value 
than when the true strength is deficient. Again, the manufacturer of a “ guaranteed " 
product is usually much more concerned with ensuring that it does not fall below the 
guaranteed standard than that it exceeds such standard. In such circumstances we may 
be particularly interested in “ one-sided " tests of the type € < £y and as we have seen, 
there more often occur U.M.P. tests for this class of alternative than in the case when & 
can have any value. We might, therefore, be quite ready to accept such a test, knowing 
quite well that it may be insensitive in part of the range of the unknown parameter, merely 
because errors in that range are relatively unimportant. 

Similarly we might be willing to accept a test which had a poor diseriminatory power 
in part of the range but compensating advantages elsewhere, simply because we know 
beforehand that values of the parameter rarely or never fall into that particular part of 
the range. This is equivalent to prior knowledge of the distribution of the values 
determining the alternative hypotheses. 


26.48. It is difficult to reduce rather vague prior knowledge of a parameter to numeri- 
cal form, and hence to extend our theory with great precision to cover these cases ; but in 
praetice it is desirable to consider, before adopting a test, whether any prior knowledge is 
available, or whether our interests centre on particular parts of the range. If they do, we 
may consider the behaviour of power functions of the possible tests at our disposal and 
examine which is the more powerful test in the particular part of the range which interests 
us most. The mere fact that the theory developed in this and the succeeding chapter 
makes no assumptions about the prior probabilities of admissible alternatives does not 
mean that we should be acting sensibly in ignoring any prior information which may be 
at hand when applying the theory, or that we need feel compelled to apply tests with 
optimum properties in regions where we know the unknown parameter-values will not fall. 


NOTES AND REFERENCES 


'The theory of this chapter is very largely due to Neyman and E. S. Pearson, whose 
treatment has been closely followed. In their first contribution to the subject (1928) the 
likelihood criterion was developed, the theory of first and second kind of errors and power 
of tests being given in 1933. For the theory of unbiassed tests, see the papers of 1936 and 
1938. In the last few years the literature has grown considerably. 

Feller (1938) has shown that similar regions only exist in rather exceptional circum- 
stances and that the theory of composite hypotheses is incomplete. Tables of certain 
power functions and distributions associated with likelihood tests are given by Mahalanobis 
(1933), Neyman and Tokarska (19360), Wilks and Thompson (1937a), P. C. Tang (1938), 
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David (1939), Nayer (1936), and in Tables for Statisticians, Part II (Tables 35-37). See 
also Mahalanobis (1933). 

For tests based on the likelihood ratio, see Neyman and Pearson (1928, 1931a, 19315), 
Pearson and Wilks (19335), Wilks (1935a), Nayer (1936), Welch (19364), R. W. Jackson 
(1936), Sukhatme (19365), Bartlett (1937c), Wilks and Thompson (1937), Wilks (1938a), 
Bishop (1939), G. W. Brown (1939), Mood (1939), Hartley (1940), Wald and Brookner 
(19410). 

For the general theory, see also Welch (1935), Kolodzieczyk (1935), Neyman (19355, 
1937b, 19385), Daly (1940), Pitman (19395), Wald (19392, 1941a), Wolfowitz (1942), E. S. 
Pearson (1941, 1942a), Dantzig (1940), P. L. Hsu (19415), Simaika (1941), MacStewart 
(1941), Scheffé (19420, 1943). 


EXERCISES 


26.1. Examine the following argument: To accept H when it is false is equivalent 
to rejecting not-H when not-H is true. Hence, if K = not-H, to commit an error of the 
second kind for H is to commit an error of the first kind for K ; and thus there is 
no distinction between the first and second kinds of error. 


26.2. For the distribution 
dF = p ete- dy, z2y 
L0 Sy 
show that for a hypothesis H, that 6 =, y — y, and an alternative H, that B Hm 
y = yy the best critical region is the region W, where p, = 0, together with the region 
W, defined by 
1 1 B 
= — -log k + log 1 5, 
Rom A: Yobo mls 5p al 
provided that the admissible hypothesis is restricted by the conditions y, < yo, fl; > Bo. 
Hence show that a U.M.P. test exists in such circumstances. 


t< 


(Neyman and Pearson, 1936a. This shows that a U.M.P. test can exist for more than one unknown 
parameter.) 


26.3. If the distribution function of z, . . . v, is given by 


Doe OD M Ari Bn SS CO 

Show that the frequency function may be put in the form 
2(7 — 2 n 
fem ( 2T en (id) 

and hence that @ is a “shared ” estimator sufficient for y and c. Show further that the 
best critical regions for ye, o, differ according as o? > Gj, o? < o? or o = o,, and that 
their boundaries depend on y. Hence no U.M.P. test exists for admissible alternatives 
o> 0. 


(Neyman and Pearson, 19360.) 


EXERCISES AP 


26.4. In the previous exercise put ø = y and consider the class of hypothesis y > 0. 
Show that there are different best critical regions according as y > y, y < y, and that 
their boundaries depend on y. Hence there is no U.M.P. test, but @ is sufficient for y. 

(Neyman and Pearson, 1936a.) 


26.5. In samples from a normal population, show that the probability of accepting 
the hypothesis that the mean u <j» when, in fact, it is false and u = p, > us—that is, 
the probability of an error of the second kind—is 


K eu PESER E n-1 _ m? ais vp uA 
(2) aria." exp ( 95 ) v6] ..* du dv 


where poo 
o 


and ¢ is the value of ~ == corresponding to the significance level 1 — « for the control 


of errors of the first kind. 
(Neyman and Tokarska, 1936.) 


26.6. In six samples of six members each the following values were obtained— 


Sample. Mean. D 
1 8433 24,722 
2 8200 94,133 
3 7933 149,733 
4 8120 45,037 
5 7971 88,480 
6 8203 49,921 


with s3 = 104,588, S; = 75,338. à 
2 2 3 
Show that Ay Ñ = 0:8508 and Aj = 0:6219. The 5-per-cent. levels are respectively 


0:67 and 0:54, so that there is no evidence of heterogeneity. 
(Pearson, appendix to papers by Wilsdon, 1934). 


26.7. Verify that the likelihood ratio leads to “ Student's" test for an unknown 
mean in normal samples, to the use of Fisher's z in testing the equality of two variances, 
and to the t-test for the difference of two means in normal populations with the same 


variance. 


26.8. If samples nı . . . ny are drawn from the populations 
ap = 2 exp ( -77 ) a, t=1...k 
O; o; 
use the likelihood ratio to test the hypothesis H, that the populations are identical, 


showing that E 
IN =n, = g (z; — 2)" Him, say, 
(&—z)' — WW 


A.S.— VOL. II. 
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where 4; is the mean of the ith sample, x; is the smallest member of that sample, z, is the 
mean of all samples together and x , is the smallest value in all samples together. 
Show that the distribution of z;; and l is 
cl we s exp4 — m (l + vi) 
(n; — 2)!/ * o 
and hence the moments of L, are 


[r(a, 51 4 £s 
SQ Wer} E r(v 25-3) 
AT EESE ae ae 


If H, is the hypothesis that the populations have the same c but any possible different 
B's, show that 


where 7 is the weighted mean of the ls, and that 


al. "us 
UNPI(N E) T(m 253) 


Mp (Ly) = 
T(N —k Tn 
( + Pp) RP AT) 


If H, is the hypothesis that the populations, being known to have identical o's, have 
the same f, show that the distribution of 


is dF = — SAN S ar Rms L,)*-2 dL, 


I(N—EJP(k-1) 
(Sukhatme, 1936b). 
26.9. In the notation of 26.36 show that, if H is true, the criteria Ay, and Ay, are 


distributed independently. 
(Neyman and Pearson, 19310). 


e 


CHAPTER 27 
GENERAL THEORY OF SIGNIFICANCE-TESTS— (2) 


Bias in Statistical Tests : 

27.1. In considering the problem of estimation by confidence intervals in Chapter 19 
we had occasion to remark on the rather arbitrary nature of determining the interval so 
that both inequalities 0, <0 and 0 <0, had an equal chance $a of fulfilment. A point 
of a similar nature arises in the testing of hypotheses, particularly when an asymmetrical 
sampling distribution for the criterion is concerned. Consider, for instance, the testing 
of the hypothesis that in a normal sample of n members the standard deviation o has an 
assigned value gp irrespective of the mean y. As we have seen in Example 26.3, there is 
no U.M.P. test for all o > 0, though there is one for o > e, and another foro co, In 
choosing a test to cover the whole range o > 0 we have, therefore, a-certain freedom of 
choice, since there exists no “best” test as we have previously defined the term. A 
common test in practical use is to take the sample variance s* and accept the hypothesis 


c = o, if and only if 
8 <a <a, : 4 : 4 20 (9750) 


where s? and sj are determined from the distribution of s*, namely 
2 
dF œ s"~* exp ( — 2) d(s?), E 5 A . (27.2) 
205 
such that 


[ar - f ara- porro cur Mert 


ff equal “ tail” areas of the distribution. This 


Ins 1 2 hosen so as to cut o 
WAS he first kind; but so equally well would the 


procedure will, of course, control errors of t 
selection of s? and sj so that 
| i=- 2 REL ENDO v Dt 3 
0 


INL Sd es EU . (27.5) 


si 
i infini ber of regions which will control 
led t =a. Thus we have an infinite number : Wade 
ae E ku Tt is natural to seek for some criterion which will distinguish one 


as better than the others, recognizing that no U.M.P. test exists. 


and 


urally from the following consideration. In the 
f the test for different values 
i ith a, = %, = 4, let us calculate the power o 
ee ee e be d onm the distributions of type, (27.2) by means of yee 
plete T-function or the equivalent z^ integral. For any given o we have to fin 


si 2: 
B (si lo) = ie + f. dF, 5 5 . 
307 


27.2. Such a criterion arises nat 


. (27.6) 
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qm Mn 

Cm cd (Z)* waf) ees. (27,7) 

n—l g? 

nes) 

Fig. 27.1, adapted from Neyman and Pearson (1936), shows the relation between 
the power function 8 and o? for x, = x, = 0-49, n = 3, the rejection level being 0-02. 
+10 
08 
-06 


"0L 


Power of Test. 


02 


oo 05 10 15 20 
O? in Sampled Population (in units of 02). 


Fro. 27.1.—Power Curve in Samples of 3 for o? from a Normal Population (see text), 


We see that for o> 1 = e, the power increases, and so also for o < 4 = łooo But 
between jc, and o, the power is less than 0-02, i.e. less than 1 — x. Hence for such values 
the chance of an error of the second kind, namely, the acceptance of a false hypothesis, 


would be greater than the chance of an error of the first kind, namely, the rejection of 
a true hypothesis. 


27.8. Whether this is felt to be anomalous depends on the relative importance of 
the two kinds of error in particular cases; but, other things being equal, it may be felt 
more important to avoid the second kind than the first, and not to have a greater probability 
of accepting the hypothesis when it is false than of rejecting it when itis true. This, at any 
rate, is the basis of the criterion which we proceed to discuss, namely, that the critical region 
w should be chosen so that P (E ew) is a minimum when the hypothesis tested is true. 

Consider then the case when H, ascribes to a parameter 0 the value 0,, and the admis- 
sible alternatives ascribe other values to 0 but do not differ from H o in other respects. We 
shall say that w is an wnbiassed critical region if, and only if, 


| pede =P Eewo) =1—2, : 2 E . (27.8) 
w 
and for any other 0, say 0’, 


[2 Ode Pgevioi—s . . Q9) 


9 
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Equation (27.8) expresses the usual control of errors of the first kind and (27.9) the mini- 
mising property of w. If a region is not unbiassed it will be said to be biassed. 


27.4. In certain cases there will exist among the unbiassed regions a w such that 


paz >| pyar . Er ae Orit) 


We 
for all admissible 6’, Such a region may be called the best unbiassed critical region and 
the test based on it the uniformly most powerful unbiassed test, or briefly the U.M.P.U. 
test. It minimises the risk of errors of the second kind among the class of unbiassed tests. 
As we shall see presently, U.M.P.U. tests do in fact exist in certain cases. 

The use of the word “ unbiassed ” in this connection is rather arbitrary and is not to 
be interpreted as meaning that biassed tests will give systematically wrong results, or that 
unbiassed tests are based on unbiassed estimators. Fortunately the different uses of 
the term “ bias " usually occur in different contexts and confusion is infrequent. 


Unbiassed Regions of Type A 
27.5. Following Neyman and Pearson, we now define an unbiassed critical region 
of Type A as one for which 


[ pede =1—, Rage are c Fey ut) 
w 


a 
DEED EMEN ee 


2 
and la] p de] is a maximum. . e . (27.13) 
w 0-0, 


We shall, as usual, assume that the differential coefficients exist and shall also assume that 
differentiation may be carried out under the integral sign, so that we have for all w, 


2 p dx f gy dx f p' dz, say, . = + (27.14) 
90 w w w 


00 
and similarly for the second differential coefficient which we denote by p”. 

The first condition (27.11) controls errors of the first kind; the second makes the 
region w locally unbiassed ; the third, (27.13), implies that as 0 departs from 0, the power 
function increases more rapidly than for any other unbiassed critical region of the same 
size. Thus in the neighbourhood of 0, the test may be said to be better than others of the 
unbiassed type. It may not be better for larger values of | 0 — 0; |, but the Type A tests 
are based on the supposition that it is more important to detect small errors of the second 
kind than to minimise the risk of large errors, which will probably be detected in any case. 


27.6. The regions of Type A may be found by the use of the following theorem : 

the region w is an unbiassed critical region of Type A if, within wo, 
p” (00) > kı p’ (00) + kap (0o) + + «+  « (27.15) 
and outside Wo, 
P” (00) < ki p' (00) + kap (0o) + + + + (27.16) 


E 9; 
Shere p' (Oo) = He etc., 


and k,, k, are chosen so as to satisfy (27.12) and (27.13). 


310 GENERAL THEORY OF SIGNIFICANCE-TESTS 


Suppose that F, ... Fm are functions of a... 2, and that 


f F;dx =c;, a constant. : é 5 « (27.17) 
w 
Let w, be a region such that inside it 

FORE. PRESE (97.18) 
and outside it a 

EES E Make s a S |. (27,19) 


where the k’s are constants chosen so as to satisfy (27.17). Then for any w for which 
(27.17) is valid 


D Pode <| na . 0. 0. 0. 00» 
In fact, let ww, be the common part, if any, of w m Wo As both w and ws satisfy (27.17), 
we have 
f Fy dx - | cou T TUNI EN 
t)—WWs Wa — ws 
Now Fy dz — Í Feds =Í Py dx — | F, dz 
Woe w My ww, w-wws 
> X (k; Fy) dx — f Z (k; F,) de 
Wa WW M ws 
20, 


in virtue of (27.21). 


9* In our present case take F, as p” (0) and F,, F, as p’ (9), p (90) respectively. Then 


(27.20) is true, and hence (27.13) is satisfied if (27.18) and (27.19) are true ; and these will 
be found to reduce to conditions (27.15) and (27.16). The theorem follows. 


27.7. If (27.14) holds, and if there exists a sufficient estimator ¢ for 0, then the 
‘Type A region is bounded by surfaces of constant t. For then we have 
p (0) = p: (t, 0) pa (%) . : à K . (27.22) 
and hence, from (27.15), on substitution, 
Pi (t, Oo) > kı pi (t, 0o) + ka pi (t, 0o) 


within wo, and conversely outside it. The equality must hold on the boundary, which 
is equivalent to the theorem. 


27.8. Writing 


9 D 
$- EOM k; k 5 : . (27.23) 
o? 
$= E log p 2 : : . (27.24) 
90? 8-8, 
we have 
p» 2 cáp (60) 


—— ooo kee 1 


Js. 


A 
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and hence the’ inequality (27.15) reduces to 
¢+¢>h¢+kh . ; : 3 + (27.25) 
within wo, wherever p (0,) does not vanish ; ; and conversely outside wy. 
We may distinguish three special cases :— 
(a) If ¢’ is a function of ¢, say F (4), we have— 


TGC Eo sailed eu 3 s cs ERES (27190) 
and the Type A region is bounded by the surfaces 
4$; —c; and j—1...m, : . (27.27) 


where m is the number of roots of (27.26). In this case, as we saw in 17. 30, there exists 
a sufficient estimator. It follows that w, is defined by inequalities of the type 
& <$ < 
and we may, as in 26.24, use the ¢’s as new co- m and calculate the size of a region 
from their distribution functions. 
(b) As a simple case of (a), if 
¢ =A +B. z A = $ . (27.28) 
we find, for (27.26), 
??—h$—kh 20, . 5 3 . (27.29) 
and ay limits of 4 are given by the two roots of this quadratic, 
) If 4’ cannot be expressed as a function of ¢ which does not involve the z's explicitly, 
we E have 
?)»kh-c-h$—4$ . i 3 ; . (27.80) 
In this case, considering ¢ and ¢’ as two co-ordinates of a point in a plane, we see that 
the region for which (27.30) is true is the one “ above " the parabola ¢’ = k, + kı $ — $3, 
and that 5, b, are determined by 


f af pg nar ics. E CE, 
f suf via, pairo MENU ROE 


In this instance we can reduce the problem to two dimensions by using two new co-ordinates 


$, 4. 


Example 27.1 
Consider the normal distribution 


dF = vage C 4 (x — u)? } da. 


To apply the foregoing theory with complete rigour we have to show that (27.14) is true. 


- We shall assume that this is so, referring the reader for a formal proof to Neyman and 


Pearson (1936). 


We have, then, with 0 = u, a 
log p (u) = — 3 n log (27) —$ Z (x — u)? 
=Z- u) $ =-—m, : 


and hence this case reduces to that of (27.28). We write 
$ — n(& — uo), 
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and can clearly use @ instead of ¢ as a co-ordinate, which confirms the result of 27.7 since 


€ is sufficient for u. 
It follows that the unbiassed region of Type A is given by 


<, $23 
2, 
where f p (@) di =a 
t 
2, 
and E p (&) (@ — u) dé = 0. 


Now if H, is true, that is if u = ʻo 2 is distributed in the form 


dE Jm 5E iy h 


Hence 4, = — #, and the Type A region is defined as being outside the range 


a - A 
OA «E «pu + — 


Mn 


where A is given by 


ferrets 


In this case the Type A test leads to the usual test based on equal tail areas. The 
same test follows from the likelihood ratio, as the reader can verify for himself. 


Example 27.2 


If the distribution is normal with zero mean and variance c?, and H, is that o = oo, 
we find 


— [lys 2 1 $ 
$ a (x26) ci g, € — n). say. 


This also satisfies (27.28), and the Type A region will be defined by 


Ua hy bam oro <v, 
96 
where fo dv =a 


and f'o (v) w — 2) dv = 0. 
Here p (v), the frequency function 2s the second moment, is 

p w) = TA vin—-2 g-iv dy, 
and we find, for the-second equation, 


Vs Va 
f vit e-i dy — af vni g-i? dy — 0, 


vne vi 


Integrating the first member by parts, v being one part, we are left with 


v, 
|- w” e] =0 
v 


or of? emi — vin emih, 


i 
5 
, 
j 


——— 
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This has to be solved in conjunction with 


" meal )n—2) p—tv E 
f. 3m T Gn)” eV dv =q. 
The numerical solution can be carried out by successive approximation or graphically. 

In this connection Fig. 27.2 is of interest. It shows, for samples of two and « = 0:98, 
the graphs of the power function for the ordinary test with equal tail areas, in addition to 
the power functions for the Type A test, the U.M.P. test with a > c; and the U.M.P. test 
with o < oy. 

Evidently, for o > c, the best critical region (2) has the greatest power (as it must 
have), and for ø < o, the best region (1) has the greatest power. The test based on equal 
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Fic. 27.2.—Power Curves of Four Different Tests of the Variance in Normal Samples of 2 (see text). 


tail areas has a greater power than the Type A test for ¢ > c, but a lower power for o < a, 
besides being biassed, as we have seen. 

As n becomes larger the same effects persist, but the Type A and the “ equal tails " 
tests become closer together in power. For samples of 20 or more there seems to be no 
serious loss in using the latter since the range of bias and its magnitude are then very small. 
Tf, of course, we knew in practice that c > c, we should use the U.M.P. test, and cases may 
arise, even when such knowledge is lacking, where “ one-sided " hypotheses of this kind 


are all that concern us. 


Invariance Theorem for Type A Regions 

27.9. It is important to show that the regions selected on the basis of Type A criteria 
conform to corresponding criteria if some other function ¢ (0) is used instead of 0 itself. 
In Example 27.2, for instance, where we took @ to be the standard deviation c, should we 
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have obtained the same regions if we had taken 0 to be the variance o? * The answer is 
affirmative under certain general conditions, as we should expect from the relationship 


with sufficient estimators. é 
Suppose we have a new parameter ¢, given by 


6—60,-f()-—wv(). : ; j . (27.33) 
where f(0) — 0. Then if p (v) satisfies (27.14) and the similar equation in second differen- 
tials, if y is monotonically increasing and [2] > 0, then the region based on ¢ is an 

0 


unbiassed critical region if that based on 0 is so. It is sufficient to show that (27.15) 
and (27.16) are satisfied for ¢. Now 


9-90. (=o, [?]-v-9 [E] (ay). es 
Thus 
p: (E | 60) = p; (E | v (0) ) 
= py (E | 90) v, 
and P: (E | y (0) ) = po (E | 90) v'? + p, (E | 0o) y”. 
Solving these for p, and p; and substituting in (27.15) and (27.16), we find 


m (E |p (0)) > ki’ pz (Z| p (0)) + he!’ p: (E|v(9)) . : » (27.36) 
within w and the contrary outside, where 


8. TRYURY, M —kSm152 . .(0736) 


The result follows. 


Regions of Type A, 

27.10. The regions of Type A are determined so that tests based on them are 
U.M.P.U. in the neighbourhood of 4). We now consider a region, said to be of Type A,, 
which is U.M.P.U. everywhere, i.e. which obeys (27.11) and (27.12) but has, in place of 
(27.13), 


[ pdx >f pdx : E - . (27.37) 
<W w 
for every admissible 0 and -every w satisfying the other two conditions. 

It is conceivable that (27.37) does not entail the existence of a U.M.P.U. test, for there 
might be an unbiassed region of size-1 — « for which the derivative of p dx did not exist 


at — 0, but which nevertheless gave a more powerful test. This refinement, however, 
need not detain us. 


27.11. If W, represents the sample-space where the density is not zero, if 
4 =A + Bé, 


and if ¢ (0;) does not vanish identically in W. then the unbiassed critical region of Type A 
is necessarily of Type A,. 


Let w, be the Type A region, which is determined ex hypothesi by two numbers c, 
and c such that— 


€; Spo «6 outside wo, 


REGIONS OF TYPE A, 315 


We have to show that 


f p dr > f pda 
Wa w 
for all admissible 0 and any w for which 


Joe -1-s Ded grasp 


with the consequence that 
J "pde: e ES c UR Hr 
w 


Since à' = A + Bà we have, solving this equation as a linear differential equation 
of the first degree, 


$ =Í | Aexp( — | Bao )ao + T} exp | Bas. (27.40) 


The reader may verify that this is a solution, and since it contains the arbitrary constant 
T it is the most general solution. It follows that we may' write 
log p = P (0) + TQ (0) +f (2) say; . .  . (2741) 
where P and Q do not depend upon x. We then have—primes denoting differentiation with 
respect to 0 and the suffix 0 relating to 0,— 
¢o= P, + TQ. k 4 : 2 . (27.42) 
We note that Q cannot be zero, for if it were we should have 


0 = | go pode = P; | pode = Pi, 


which would imply that 4) was identically zero. 

In virtue of the lemma of 27.6, the proposition will be proved if we can show that 
for fixed 0 and 0, there are two numbers a and b, depending on 0 and 0, but not on the 
w’s, such that 

p > Po (ad, + b) inside w, i " A . (27.43) 
and the contrary outside w,. Putting the values of p and 4, in this expression, we have 
to show that a and b can be found such that, inside wo, 

exp (.P (0) + TQ (0) +f (x) } > exp( P (Oo) + TQ (00) +f (#)} {aPo + «TQ, + b) 
or, writing r = P (0) — P (4), q = Q (9) — Q (0), such that 

exp (r + ql) > aQyT + aP, +b 
>a T +b, say. 5 > 5 . (27.44) 
Here q cannot be zero, for if it were Q (0) would be equal to Q (0;) and, integrating the 
frequency functions over W, we should find r — 0. The alternative hypothesis would 


not then differ essentially from Ho. 
Consider at the outset the case when c, and c, are different. From (27.42) we see 
that $, depends only on T so far as variation in x is concerned, and that 
Mc c AT eret ao = T, (say) "p eye 


0 


ae T = 27 =T, (say) Ire Emp 


0 
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T, and T, are different. Choose a, and 5, so as to satisfy 
a, T, +b, =ete™ 
a4 T, +b, — en f° 
Then (27.44) is satisfied at the boundary points and we have merely to prove that 
Cı «d, < c implies e**7 < a, T +b, ) 
d, «c, and 4,2 e, imply &*?7 — a, T --b,[ 7 
"This follows from the fact that 
y — e a T— b, 


. (27.47) 


. (27.48) 


has only one minimum, between T, and T',, as may be seen by differentiating it twice, for 
the second derivative is positive and hence the first is a monotonically increasing function. 
But y vanishes at T; and 7’, and hence is negative between those values and positive 
outside them. 

Finally, if c, and c, are equal, say to c, we choose a, and b, so as to satisfy 


Pi QUT, =e 
qe — a, = 0}. $ " : . (27.49) 
eg Sg T, —b, =0 
Tt will be found that y has a minimum at T = T, and vanishes there. It follows that in 
the region m complementary to w where 0, = c, we have 
eta? — ga T Eb, 


and thus in w, where 4, <c or c < d, the left-hand side must be less than the right- 
hand side. The demonstration is complete. 


Example 27.3 


Consider again the data of Example 27.2. We have already seen that for this dis- 
tribution ¢' = 4$ + B, so that the regions of Type A are also of Type A,. Among 
unbiassed tests of the hypothesis this is the uniformly most powerful test. 


Composite Hypotheses: Regions of Type B 
27 -12. We now consider the extension of the foregoing results to the case when 
H, is composite. For simplicity we will suppose that there are two parameters 0; and 6), 


H, specifying 0, as say 6;) and leaving 0, undetermined. Then a region Ww, will be said 
to be of Type B if 


(a) f P (910, 02) dz = 1 — « for all admissible 0,; . : a $ . (27.50) 
Wo F 
(b) he P (9;, 02) dz may be differentiated twice with respect to 0, under the integral 
sign ; 
) [ 
(c dm p (Ox, 8.) de | M NE Qu ar (27.51) 


0,6, 


(d) For any other region w satisfying (27.50), 


s d| »|S9 
901 MX is ETE &] M i S 
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These conditions are obvious generalisations of those defining Type A. Putting now 
ð f 
ġ = 20, log p "PET n SPI . (27.53) 
COUR TU eae ae Venio Aes MOT) 
dbp 


we state that the Type B region will exist and may be found if ġ, and ¢, are algebraically 


independent, if 
dou =A,+A, $i T4. ds 
di; = By + B, di + B: d. 
[M = €, +0: h 
and if the law of distribution of $, is uniquely determined by its moments. We omit the 
proof of this theorem, for which see Neyman (19355). 


. (27.55) 


Simple Hypotheses with Two Parameters: Regions of Type C 

27.13. The extension of the foregoing theory to the case of a simple hypothesis 
specifying several parameters presents some new features. Again to simplify the discussion 
we shall consider two parameters, 0, and ĝa. 

Consider the power function in the neighbourhood of 0, = 0; = 0 which we will suppose 
to be the values specified by H,. Writing for the function 


B (8, 92 | w) = [^ (85 -0,) dx s : ? . (27.56) 
2 E j=1,2 AS 
HRS Pp J=L . l (27.57) 
a a pos » (27.6 
Eases = Bis j k=1,2 . E (27.58) 


we have, assuming an expansion by Taylor's theorem, 


B (01, 0. | w) = B (0, 0 | w) +0; fy (w) + 0. Ba (W) 
+ 4 (01 Bu (w) + 20, 0. Bia (W) + 03 Baa (0) 3 +--+ - . (27.59) 


To extend the idea of unbiassed tests to such a case we require in the first place 


bı (w) "ed SCA e A O TOO) 


fa (w) = 0 
Secondly, there will be a minimum at 0, = 0, = 0 if 
= b= bi bine o ah es OL) 
and Buy B> 9 - : : d R . (27.62) 
Tf these conditions are satisfied the power function for small values of 0; and 9, is effectively 
B (01, 0a |w) 21—«- 4 (01 Bu + 2010. Bis + 03 Bas} - . (27.63) 


We may represent this diagrammatically as in Fig. 27.3, which shows one of the ellipses 
for which the power function is constant. y 
Since the hypothesis H, is that 0, = 0, = 0, we may speak of the value 0, as the “ error 
in 0, ”, and similarly for 0a; and if, as in the case depicted, the co-ordinate axes are not 
the Ram as the principal axes of the ellipse it is clear that for values of 0, which are not 


zero, errors of positive and negative sign in 0, are not equal. From this viewpoint it may 
" 
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be said that the minimisation of the power function does not control positive or negative 
errors to the same extent ; for the points A and B in Fig. 27.3 lie on the ellipse of constant 


Fig. 27.3.—Ellipse of Constant Power for Simple Hypothesis with Two Parameters (see text). 


B, so that the probability of detecting them is the same, though A represents a positive 
“error” in 0, greater than the negative “error” given by B. 


27.14. Whether this is a desirable property of the test depends to some extent on 
what the test is intended to do. To avoid the anomaly we must require that 
: ba= 0 r, : f ; : . (27.04) 
Furthermore, even if this condition is satisfied and the principal axes of the ellipse coincide 
with the co-ordinate axes, there may still appear anomalies if the length of one axis is greater 
than that of the other; for then errors in one parameter are not detected as frequently 
as errors of the same size in the other. Here again it is a matter of particular cireumstance 
whether such an effect is regarded as objectionable. (We disregard the fact that it can 
be removed by appropriate scaling of the parameters, which may or may not be artificial.) 
To remove it we must require that 


Bu = Bas, . A = ^ . (27.65) 
so that the ellipses reduce to circles. 
We may refer to the ellipses as “curves of equidetectability.” 


27.15. With the foregoing explanation in mind we define wy as a regular unbiassed 
critical region of Type C if it obeys the conditions 

bı (wo) = Be (wo) = 0 ey Pec : 7 ; . (27.66) 

Bie (wo) = 0 ó : P ; : . (27.67) 

Bax (Wo) = Baz (Wo) . d 3 : . (27.68) 


i 
i 
' 
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and if, for any other region obeying these three conditions and for which 


&(,0|w)—8(00|w) 21—a . . . .(9?7.) 
we have 
Bar (Wo) > Bar (w). tate oe ai) Lc roc Even 
Secondly, if a region w, possesses the property that 
Bi (w1) = Bs (w) — 0 : . . + (27.71) 
Biz (W1) — Bu (Ws) Bas (w) <0 . . . » (27.72) 
and for any other region obeying the conditions 
b (0, 0| w) —8(0,0|w) =1—a SUE T OY 
Ba (wi) _ Biz (w1) — bz (w) 
P(e) Aue) ae ec 0 c c 0009 
we have 


Bur (w1) > Bu (w) s : ‘ . (27.75) 
we shall say that w, is a non-regular unbiassed critical region of Type C. 

These equations are analytical ways of saying that the regular region of Type C is 
the one, among all regions haying circular curves of equidetectability, which has the smallest 
radius for any given value of the power function; whereas the non-regular region of Type C 
is the one, among all regions having similar ellipses of equidetectability, which has the 
smallest axes. 


27.16. We now state without proof theorems similar to those demonstrated above 


for the case of a single parameter. 
Write 


0?p ] 
p= | = ete. 
AP [uj 98, |nceco 


Then w, is a regular unbiassed critical region of Type C if 
(a) inside w 


Pu > kı (pu — Pos) + ka Pia + ks Pi + ka Pa + ks p, + (27.76) 
and outside w, the inequality is reversed— 
©) [mds pad=f Qu-n34-9 | j=1,2, (27.77) 
w wo We 


Secondly, if w, satisfies the conditions— 

(a) that inside w, 

fu > ky (Yin Pu — Yu Diz) + ka (Yas Pur — yn Par) + ks Di + ka Ppa + ks p (27.78) 
and outside w, the inequality is reversed, the k’s as usual being constants and the y’s obeying 
the conditions 

Yu > 0, yia — Yu Ys <0; 
(b) f p; dz = ip (Vis Pu — Yn Pr) de = fe (722 Pir — Ya Paa) dæ = 0, — (27.79) 
wi a a 


then w; is a non-regular unbiassed critical region of Type C, having ellipses of equidetecta- 


bility determined by 
x Yu 01 + 2y1 0, 0, + ys 02 = constant. 5 3 . (27.80) 
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27.17. The theorem of invariance of 27.9 no longer holds in general for the present 
case. If we transform to new parameters 7, and £,, the equations of transformation 
PNE [95 
> dt, a 80, do, ue 90, dhs, 
ete. will not transform an ellipse co-axial with the co-ordinate axes 0, 0, into one co-axial 
with ¢,, Ca. Thus, in general, the effect of a transformation is to make a regular Type C 
. region into a non-regular Type C region. 


27.18. As usual, the conditions for the Type C region may be simply written in terms 
of the derivatives of log p. Write 


= 2 27.81 
$= [aye |, , T1 PE E 
E ws] (27.82) 
fac [ aOROOR sien ag) coe We, 
Then if 
bin = Ag, + Bye bs + Cy. pa . . . . . (27.83) 


we shall have 
Pin = ($; be + An + By di + Cir b2) p . . . (27.84) 
and the inequality (27.76) becomes 
(L— ki) i — ka dı pa +k p hé—kh4-kh»90. . (27.85) 
where the k’ are new constants easily expressible in terms of the old. They must be deter- 
mined so as to satisfy (27.77), which reduce to 


I. dy p dx =|. ($1 $2 + Ay) p dz = f (41 — 43 + (4u — An) } p dx = 0. (27.86) 


Example 27.4 


Suppose we have a sample of n, from a normal population with mean x, and unit 
variance and a second sample of n, from a normal population with mean H and also unit 
variance. The simple hypothesis to be tested is Hy = Mz = Mo, Where uo is some specified 
value. We consider two cases :— 

(i) in which errors of the same size in Hı and u, are equally important ; 

(ti) in which, for some reason, there is a stronger desire to avoid errors in My than 
in u, and that therefore a greater number n, of members has been taken in the second 
sample. We also assume that the sizes of errors judged of equal importance aré 
inversely proportional to Vn, so that we are led to consider new parameters— 


M = (tr — po) vm, Na = (i — Ho) Vn > : S(2 7787] 
Cast 1.—The frequency function is 


Ta ntn, 
p sev -1 Sy Bea ANE E ~mt], 
z ml 
Tt will be found that 


$; = n, (& — uo) ; $; = Nz (£, — uj); 
dup nı = Áin [2n =0= áz; as ==" = As, 


Mia M meee eel 
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From (27.85) we then find 
(1 — 5) ni (Z, — Ho)? — ky ny m; (x, — Ho) (a — fo) + kı n3 (ža — fo)? 
— ky m, (E. — po) — k, na (a — po) — ks 2 0. `. (27.88) 

The law of distribution of @, and z, may be written $ 

p c exp[— & (m (£i — Ho)? + n; (Ža — uo)?}]. A . (27.89) 
Put u = Vn (E — uo) and v= vm, P — ny). 
. Then the region w, is determined by 
(1 — ky) n, u? — hy ww A/(n, na) + kı m, v? — k, uyn — k, v/m — k, 20 (27.90) 


where ; p (u, v) du dv —1 —« 
Wo 


Í up (u v) du de = | v p (n, v) du do — | wv p(u,v)dudv — 0  . (27.91) 
Wo Wa è 


wo 


| (n4 u? — n, v?) p (u, v) du dv = (1 — a) (n, — nj) . (27.92) 


and p (u, => exp{—4( 3 (u? v2). 


It is evident from (27.90) that in the (u, v) plane the boundary of w, is a conie, From 
(27.91) we see that it must be coaxial with the co-ordinate axes and have its centre at the 
origin. Hence k, =k, = k, = 0. Finally from (27.92) we find that the boundary is 
of the form 


SIEUT 27.93 

aot HIER . ` B D . (27. ) 

, 1 n USE) TORTE 2 a 
where A É j ee e $ 5 (27.94) 


The Type C regions are then defined by (27.93), but we have to express a and b in terms 
of known constants, including the probability level 1 — « We have to satisfv (27.92), 


and will show that a solution always exists. 
Put 


E (a, b) = à (n, U? — n, v?) exp (— $ (u? + v?) ] du dv — (n, — na) (1 — a). (27.95) 


If the boundary of w, is a circle, its radius is easily found to be 
a =b = y {— 2log (1 — «)}. 
The GNI F (a, b) outside this circle, by the substitution u = r cos y, v = r sin y, is 
found to be 
F (a, a) = (nı — m) 5- | utexp (— $ (u? + v?) ) du dv — (n, — nz) (1 — x) 
T Jwt >a 


= (1 — a) (n, — n;) 1a*. 
Now taking w, as the space outside the parallel lines 
v= +4, 


4 x PAI RE m 
which is given by a infinite, so that l, e V" dz —1-— a, 


A.S.—VOL, IL. 
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F (œ, å) = — (n, — mg) (1 — «) + z[ u? exp (— 3 (u? + v?) } du dv 


— zÍ v? exp (— $ (u? + v?) }du dv 
2n Ju, 


=i [2a <0. 
T 


F (A, 0) =n NE ent 0. 


Thus, since F (a, b) is continuous it must vanish somewhere in the range 2 <a «oo, 
à «b «co. The values for which it does so define the Type C region. 


Similarly, 


Casu 2.—In this case, using the parameters 7, and 7, of (27.87), we find 
$, =U, $a =v 
$u = 1, dis = 0, gos = l. 
The inequality becomes 
(1 — k) u? — kh, uw + kiv? — k; u — k, v — k; > 0, 


where f (u? — v?) p (u, v) du dv = 0. 
Wo 


In a similar way it follows that the Type C region is the one lying outside the circle 
u? + v? = — 2 log (1 — a). 
We leave the verification of this result to the reader. 


Certain Limiting Properties 

27.19. From the foregoing examples it will þe seen that in certain cases the optimum 
critical regions are by no means easy to determine numerically ; and it is not always clear 
that the labour involved is repaid by the results. Some consideration has been given by 
various writers to tests which have optimum properties for large n, the presumption being 
that the same tests will be good, if not the best, for small values. As usual when several 
limiting processes are involved simultaneously, the rigorous enunciation and proof of 
theorems in this field is a matter of some complexity, and we shall here merely indicate 
some of the results in very general terms without including proofs. 

It has been shown by Neyman (1938b) that there do exist tests which are unbiassed 
in the limit, and rules have been given for finding them. It has also been shown by Wald 
(1941a) that there exist tests which are most powerful in the limit, and that such as are 
based on maximum likelihood estimators are of this class. The tests are uniformly most 
powerful for the single parameter 0 > 0, and for 0 < 0,, but not both ; and for any range 
they are the most powerful unbiassed tests in the limit. Furthermore, the Type A test 
tends to the most powerful unbiassed form. 

The general conclusion seems to be that, even where the variation is not normal, most 
of the tests in current use which are based on likelihood estimators have optimum properties 
in the limit, and may therefore be used confidently for moderate or large samples. For 
small samples the position is not so clear, particularly for non-normal variation. Tests 
based on inefficient estimators are presumably less satisfactory; and for the non-para- 
metric case there is as yet no complete theory. On this latter question reference may be 
made to a useful review by Scheffé (1943), 
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The Unbiassed Character of Likelihood-ratio Tests 

27.20. Tt is of some interest to consider how far the tests based on likelihood (26.35) 
are unbiassed. i 

It has been shown (Pitman, 19395; Brown, 1939) that the Neyman-Pearson test in 
the problem of k samples based on 2y, is biassed unless all the samples are of the same size ; 
but that Bartlett’s modification (26.42) is unbiassed. We prove this in 27.25 below. 
On the other hand, Daly (1940) has shown that in certain multivariate tests such as those 
of regressions, multiple correlations, Hotelling's T' (which we introduce in the next chapter), 
and the ordinary analysis of variance and covariance for orthogonal or non-orthogonal 
data, the likelihood-ratio tests are unbiassed, at least in the Type A sense (i.e. locally) 
and in some*cases completely so. 


Pitman’s Method for Location and Scale Parameters 
27.21. In the special but not uncommon case where the hypotheses under test con- 
cern parameters of scale or location, a simplified approach is possible. Suppose the joint 
distribution of k sample-values is 
dF —f(x,—60,2,—0,...2,—0,dx,... dom, . . (27.96) 
We seek for a statistic J, independent of the 6’s, to test the hypothesis ; and clearly, if the 
test is to be satisfactory, J must be independent of the origin, ie. must be seminvariant. 
The test that the 0's are all equal is then equivalent to testing the hypothesis 
9i pem o0; : : s . (27.97) 
Without loss of generality we may suppose the hypothesis rejected if J is small and less 


than some quantity depending on the acceptance value «, and we may also suppose J 
positive ; for if either condition is not satisfied we can transfer to some other function of 


J for which it is. 

In the sample space W, J must be constant along the line x, = ta = ... = Xp = COn- 
stant, and therefore the critical region w, will be the one lying outside a hypercylinder 
whose axis is parallel to this line. When H, is true, the probability of rejection is then 


T dE atl ae RR MEM 
Wo 
and when it is not true the probability is 


Í AEE DS s o) 
We 


= [ares em HENCE DAL M. 2700) 


where w is merely derived from w, by a translation in W without rotation. If Lis any line 
parallel to x, =... =% = 0, we write 


P (L) =| (Ene T) 
-{ iis a)dgt pa css” TU) 
where n=; Z (z); . . . ` 8 . (27.101) 


and 7 is thus the distance of the point (x, . . . æy) from the plane £ (x) = 0. 
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Now if w, is defined as the locus of all lines for which P (L) > h, a constant, P (L) will 
be less than h on any L which is in w but not in w, Hence 


Jar f ar, Be E (27.102) 
Wo w 


and so the resulting test is unbiassed. Thus an unbiassed test is given by choosing J so 
that at any point of a line L it is equal to P (L) at that point. Now we may write for the 
variable co-ordinate on a particular L, say &, 


£, = t, ih 
1 
where t=, 2 (2) — VE 
Hence - 
P(L) = vef Pee LE tsa d . a (27.103) 
3 Pw 
Taking J= ra (L), 
we find 


2| fuu p Ug. M... (27.104) 
which gives us an unbiassed test, 


Example 27.5 
Consider the case where the variables are distributed normally with unit variance. 


f= emp (- 42 — 0)? }. 
(27)? 


Then we have, from (27.104), 


qe zf exp (— 42 (z; — t)? }dt 
(221 => 
Dess 


Vie (2x) 
where S = 2 (x — ï). 
In practice we should take S as our criterion, not J , and reject the hypothesis that 
the means were unequal if S exceeded some fixed value determined by «. We observe 


that in fact S is distributed as 7? with k — 1 degrees of freedom when H, is true, so that 
this value is easily ascertained. 


27.22. Consider now the case where the frequency function is 


1 zx Tk 
a0, E : : t . (27.105) 
If the z's are positive in range we put 
y; = log z;, $;—log0, . z 5 . (27.106) 
and for the frequency function of the y's we find 
exp (Zy — X p) f (0h, enmt, 1., e4), 


2I 7 107) 
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This reduces to our first case, and we have an unbiassed criterion that 


eR dee ela a ie Ee 
by putting 1 ^ P 
J -| exp (Ey — kt) f (e^t, ent... en-t dt 
ERE MADE a,\ dt 
E. (i «rs = Pp 2 aer Aries dee EYETITS 


When the 2’s are not necessarily positive the expression remains the same, except that in 
(27.108) IZ (x) becomes J (|x|). Small values of J are significant. 


27.23. Suppose now that our hypothesis asserts the equality of 0's or ¢’s and 
states that they have a common value 0, or ġo, as the case may be. Then if we take 


k 
J' = (His ) fe ate sp) 4c : 5 . (27.109) 


the test will be unbiassed. Moreover, if we regard small values of J’ as significant and the 
z's are independent, and if each frequency function is unimodal, then when 

Du UNE — 108 
is not true the probability that J’ exceeds the specified limit based on 1 — g increases as 
any 0 tends to 6, J’ therefore provides an unbiassed test. 


27.24. Finally, consider the case of k variates each distributed in the form typified by 


1 x, gm 
a =r- 1) (3) TENTE 2 e (272110 
4; T (m;) 4; 4; j ( ) 
Their joint distribution is 
gun z 
(5) exp(—2 5) mae 
dF 4 ere bh 
I ($T (m)} ber 
Hence, to test the hypothesis that the samples have the same ¢ we have 
ol ae en zit at 
Hn {r (m)} 0 PES 
where M = X (m), d Edi 
EEA E 27.112 
——SiPe) gae c 000 0 00 oc o nnm 
It is sometimes convenient to deal with 
IT (a) r 
mI AMO N NN EE gle) 
which differs from J only by a constant factor. 
The maximum value of K is 
2 IT (m?") 
ms 
and we put DE x 
og =M] Z z( log Ž SOME CUA 
i log max. K xi 08 (Sr) - og. ( ) 


L is essentially not negative, and large values are significant. 
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For testing the hypothesis that a set of variances have some specified equal value, we 
find similarly from (27.109) 


L = 5 (2) -M — 2 (mlog= ), ENS 7.15) 


27.25. The foregoing result has an immediate application to the case of k normal 
samples, for the variances are then distributed in the Type III form of equation (27.110). 
The criterion L becomes 


2 2 
jn = W log (=f) - 2 (log 2 
c 


) . ; . (27.116) 


where v as usual represents the number of degrees of freedom and N = S (»). This, as 
will be seen by comparison with (26.93), is equivalent to Bartlett's test, and shows that 
it is unbiassed. 


NOTES AND REFERENCES 


For the theory of unbiassed tests see particularly Neyman and Pearson (1936; 1938) 
and Neyman (19355). Regions of Type B have also been considered by Scheffé (19424), 
who discusses a Type B, standing in relation to B as Type A, to Type A. 

For limiting properties see Neyman (1938b) and Wald (1941a). 

See also references to the previous chapter. 


EXERCISES 


27.1. Show that the test of Example 27.1 provides regions which are of Type A, 
as well as of Type A, and that the test is a U.M.P.U. one. 


27.2. Show that the eumulants of the distribution of L of (27.114) are 
x, = M (6, (M) — log M) — X [m (G, (m) — log m1] 


k, = (— 1) {Z m" G, (m) — M" G, (M) ), rcl 
dr 
hi = 
where G, mt log I’ (m). 


Hence show that the cumulants of 


L : t — 
ic are approximately x, = zm I'(r), where 


b= 5) x] 


RE - 
irj is distributed approximately as 7? with k — 1 degrees of freedom. 
(Bartlett, 1937c; Pitman, 19395.) 


and thus that 
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27.3. Show that in samples of 3 from a normal population the distribution of the 
range r is given by— 


6 
oyn 


Hence that an unbiassed critical region of Type A is given by 


r 
V6 Ts 

[7 eir ip ev ay | =0 
0 "n 


LA fü. 
5 eHow dy =" mas e7W dy, 
0 0 


dF = 


A 
—f pevé 1 
e ic ——— e-W dy dr. 
if Ven: 


the region lying outside r, <r < fta 
(Neyman and Pearson, 1930.) 


CHAPTER 28 


MULTIVARIATE ANALYSIS 


28.1. We have already considered some aspects of the case in which each member 
of a population is characterised by several variates x, . . . z,. For instance, we have 
examined the measurement of correlation between the variates and the regression of one 
variate on some or all of the others. In this chapter we shall extend our inquiries into 
the multivariate case a good deal further, mainly by taking into account the possibility 
that different sample-members may have emanated from different populations. This 
will lead to some generalisations of the methods already discussed for the univariate case, 
such as tests of homogeneity and tests of differences between two samples. Some of our 
known results generalise with nothing more than additional mathematical complexity ; 
but in others certain new features appear, and the theory of multivariate analysis is not 
entirely a matter of generalising univariate results to p dimensions. 


28.2. One or two examples will illustrate the kind of problem with which we are 
.eoncerned. A number of skulls are discovered in a burial-ground. They are found to 


- » vary among themselves in the manner usual in biological material. Is the observed varia- 


tion consistent with the hypothesis that all the skulls were derived from members of the 
same race or does it suggest a mixture of racial types? If heterogeneity is indicated, do 
the skulls fall into two well-defined categories, such as we might expect if the burial-ground 
were the site of a battle between two races such as Saxon and Celt; or are there several 
types such as we should expect in the normal burial-ground of a town where races were 
living together and interbreeding ? Or again, if the skulls are compared with another set 
known to have been buried at a much earlier time from the same race, is there any evidence 
of a significant change in skulls from one period to the other? 

- There is no single measurement on a skull which is marked out from the infinite number 
of possible measurements for deciding questions of this kind. It is quite common for 
thirty or forty measurements to be taken by craniometricians on a single skull. Even if 
we reject many of these for practical reasons, leaving out the jawbone, for instance, because 
it is often separated from the skull and cannot be identified, we shall still be left with a 
number p which require consideration. For n skulls we shall then have n sets of p values 
corresponding to variates a, . . . %, which are, in general, correlated among themselves 
and may be highly so. Our problem is to test the homogeneity of these values, or to esti- 
mate differences between parent populations from which they were derived. We may, 
of course, apply methods which are already familiar by picking out one variate and testing 
for homogeneity. But we might pick out quite an unsuitable one and sacrifice most of the 
information. Even if time permits we cannot take each variate in turn and test it because 
the variates are correlated and our Pp tests are not independent. 


28.3. Again, suppose we have two different breeds of laying hen and are given a 
batch of eggs from the hen-run without knowing which hen laid which egg. We require 
to allocate the eggs to the two breeds, Assuming that there is no decisive criterion such 
as colour of shell, we may measure various properties of the eggs such as length, breadth, 
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weight, volume, specific gravity and so on. Some of these measurements will be highly 
correlated or, in the extreme case, perfectly correlated, as with weight, volume and specific 
gravity. In such circumstances we may reject some variates as redundant ; but in general 
we shall be left with several sets of measurements, Our problem is to find some method 
based on the retained variates for allocating the eggs to the correct parent breed. In 
particular we might search for the best linear function of the variates to discriminate between 
breeds and to enable us to assign the eggs with the maximum probability of correctness. 


28.4. Throughout the whole chapter we shall, except when the contrary is stated, 
assume that the variation is normal. In addition, to render our formulae a little less 
cumbrous we shall borrow a summation convention from the tensor calculus, If the 
affixes i, j range from 1 to p we shall write 


p 
AU ay = Axion. E h E . (28.1) 
ij ij: 


i=l j=l 


the affixes to A being regarded as ordinary superscripts, not as powers, Similarly we 
shall have 


AÏ au = $a Og. % ` : X + (28.2) 
i=l 


Whenever an affix occurs as a superscript and a subscript, summation is to be understood. + 
Clearly the actual letter used is a dummy and we have, for instance, 
Ata, = AM ay = A a . : : z . (28.3) 
We shall write the array of values A" (a square matrix) as (AÏ) and its determinant 
as | A” | or simply as | A |. 
To every matrix (a) with a non-vanishing determinant there corresponds a reciprocal 
or inverse matrix which we may write (a). Since 
(ay) (a?) = 1, 
we have, on carrying out the multiplication, 


ay att = , j-k 
=0, jh, 
which we may express as 
a, a'* = a, a = of, 5 : x - . (28.4) 


where ó*, one form of the Kronecker delta, is zero if j 7* k and unity otherwise, The quan- 
tity a is the minor of ay in | A | divided by | A | itself. 


28.5. It will further simplify our formulae and will give rise to no loss of generality 
if we suppose our variates to be in standard measure, that is to say, to have zero mean 
and unit variance. If we require results for the more general case we can easily obtain 
them from transformations of the type 

4 ti = O È H Mg . 
With this convention the equation of the multivariate normal 
vol. I, p. 376) may be written : 
dF = o exp(— $A" z;2) dz, ... dz, —. : ; Pa 


hus: . (28.5) 
distribution (cf. 15.12, 
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where the A's are related to the correlation determinant 
Loon EN EU i. (agr) 
In fact (4?) is reciprocal to (pj), as we saw in 15.12. 


28.6. We shall also frequently refer to the matrix of sample variances and covariances 
which we shall call the dispersion matriz and write as (a), where 
i z 2 3 
ay = D N a E . —. (9898) 
i j=1 
This, it is to be remembered, is in standard measure for the population, that is to say the 
observed variates are taken from the parent means and divided by the parent standard 
deviations. 


Wishart’s Distribution 

28.7. We now proceed to generalise to p variates the joint distribution of dispersions 
arrived at in 14.12 (vol. I, p. 339) for the bivariate case; and we shall also show that 
the distribution is independent of that of means, The result and method of proof are 
due to Wishart (1928). 

First of all let us write the result for the bivariate case in our new notation, For 
the distribution of means we have 


i T 
dF = STET exp ( - ava, z) e de  $j213 .  .(9) 
and for that of dispersions 
n-1 1(n—4) 
áp = (3) PEL E exp ( ? du ay ) das, day, days, (28.10) 
" n—1 n—2 2 
SEIRA 
2 2 
For instance, we have 
Qi = si, Qiz = T 8, 83, aa, = E 
1 =p 
(A9) = Y b D 
—p 1 
1—p? [pt 
so that (28.10) is equivalent to 
-4 
de = FIERE 
ans var(* = *) r(**) (1 = pimp 


n 2 2 
x exp { — 20—73 (8? — 2prs, 8, +) | ds, ds, dr. 
This, with the substitution 
p(*—4 p(n—2 — Va I'(n—92) 
2 Py ere te IDEE 


is the form found in equation (14.44), vol. I, p. 342, when it is remembered that we are 
working in standard measure, 
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28.8. Now consider the general case. With a sample of n values of p variates we 
consider p rectangular spaces of n dimensions each as the domain of variation. If a point 
in one of these spaces be fixed, the variation in the other spaces is. constrained for fixed 
values of the sample dispersions. The following argument is a generalisation of that given 
in 14.12 leading to the bivariate result, and the reader may like to refresh his memory 
by re-reading that section. 

Writing a, . . . v, for the n values of the jth variate, we have for the density function 
of the whole sample, from (28.6), 


in ul 
f= lis exp —i nv. (49 ay z 
k=1 


in D 
= ays PL 82 (49 (eu — 3) (oe — 8) }1 x exp ( 249 a). (28.1) 


We may thus factorise the density function into two parts, 


, _ nip | A ft (aes 
f= as ee ( — 5 49%) ; $ A . (28.12) 
A fie) mom 
and f= al genes exp ( ie Ati 2 : i . (28.13) 


where we have chosen the constant factor of f, so that the distribution shall have the total 
frequency unity. A 
n 
Consider now the volume element M diy dtor .. . dx,,. Inany particular n-space 
k=l 
the density is constant over hyperspheres centred at the mean. The volume element may 
then be represented as the product of elements dz; and of independent elements depending 
on dispersions. In the total space of pn dimensions the volume element may thus be 
represented as the product of p elements dē; and an independent element depending on 
dispersions. Thus the volume element also factorises, and we have immediately for the 
distribution of means 
nip | Alt | ie a Dies 5 
dF = "aye exp| — 5 4" Vd, n des ^. . » (28.14) 
showing that the means are distributed in the multivariate normal form independently 


of dispersions. 
If we define a matrix (B) with elements łn times those of (A), we may write the dis- 


tribution of means in the simple form ` 


ap2lBÉap(-sezapman . 0. ^. o 0. (810 
ET 


We note that this checks with the known results for p = 1 and p = 2. It is also seen 
almost at once that the variance of Z, is oj/n, as we expect. 


28.9. We have now to consider the more complicated expression for the volume 
element of dispersions. Let us in the first instance transfer our origins to the sample means, 
remembering that in doing so we have lost one dimension (or degree of freedom) in the 
variation of our sample-points. Let P, ... P, be the sample-points whose co-ordinates 
are the n values of xı . . . tp, one point P lying in each n-space. We shall consider in 
turn the variation of P,, then that of P, for fixed P,, then that of P, for fixed P, and P, 
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and so on. ‘The total variation will be given by multiplying the various expressions so 
obtained; and it will be sufficient if we consider the typical case of the variation of P, 
for m — 1 fixed points P, . . . Pm-1: 

For a fixed length OP,, and fixed angles with OP, . . . OP, ,, P, can vary on a 
hypersphere of n — m dimensions ; for, if we fix any particular angle, P is constrained 
to lie on a hypercone which cuts its hypersphere of variation in a hypersphere of one fewer 
dimensions, and the fixation of the origin at the sample mean imposes a further constraint. 
Further, if we regard the p spaces as superposed, as we may, the centre of this (n — m)- 
dimensional hypersphere is the foot of the perpendicular from P,, on to the space containing 
the points, O, P, . . . P, ,. Call the length of this perpendicular for the time being rm- 

The volume of a k-dimensional hypersphere of radius r is 

gi rk 


a 


and its surface area, obtained by differentiating with respect to r, is 


2 git pk-1 A 
Set . 28.16 
T QE m 
The surface area of the hypersphere of variation of P,, is thus 
Bae ec EN . . (28.17) 


E 


To find the element of volume due to the variation of P,, and the angles which OP, 


makes with OP, ... OP, , we have to multiply (28.17) by an element of variation 
normal to the hypersphere of n — m dimensions. This variation lies in the hyperplane 
determined by the origin and P, . . . P,, which is, in fact, normal to the hypersphere. 


'To evaluate it, consider the transformation 
m 
m = oT yrs fT. Mm, . < . (28.18) 
k=l 
where, of course, the z's are measured from the sample means in virtue of our choice of 
origin. We have for the Jacobian— 


J= 9 (Eni DEAE: Se) 
9 (zy ehm Emm) 
| wy Vi. Tim 
=| Va Taz Tom 
21m COP EE Pre 
Lo EMIL IENNMEN ELS. f2819) 


D 


where Ym is the volume (or “ content ") of the hyperparallelopiped having one corner at 
the origin and edges running to the points P, .. . Pp Furthermore, 


| Ems | = | an | 
a 2 


PE mk 
= Um: "LI (98.20) 


E 


WISHART'S DISTRIBUTION : 333 


The required element is thus 


and the total element of variation of Pw on multiplication by (28.17), is 


4(n—m) ,n—m—l m 
z qu 


MENS Ta ET gets E 0: hele PESE) 
r(? z-) 3 k=1 
STD] m 


Now r, is the length of the perpendicular from P, on to the space OP, . . . Jen 
and is therefore equal to v,,/v,, ,. Hence, for the variation of Pm we have the element 


gqin-m) pci m 


GIU D Hd dE P CEU EE S mS 
n (* — 2) yi-m-i ni 


We now derive the total element for variation of P, . .. Pm by multiplying expressions 
of type (28.22) for m = 1, 2,... p. The terms in v cancel except v, and vo the latter 
being unity, and we find 


già? (2n—p-1) 


Dn 
UD Re H i dép : > . (28.23) 
Er j-1 kem 
k-i 2 


Now from (28.18) we have 


buena [t an Va o A MUNI TE 
and from (28.20) v? — m? | a |. P E y n . (28.25) 


Making the necessary substitutions in (28.23) and adjoining the frequency element given 
by (28.13) we find, after a little reduction, 


3) | A [1n | a | Vn-2-2 


gi? 5-1) Tr r(? E 3 


dF = ( exp ( — PU ay ) Ida.  . (28.26) 


2 
xci 2 


This is Wishart's generalisation of the distribution of dispersions in a multivariate 
normal system. The reader who feels that the foregoing proof demands too much of his 
powers of geometrical insight may refer to alternative derivations by Wishart and Bartlett 
(1933c) or P. L. Hsu (1939a). The domain of variation of the a’s is 0 to oo for ap and 
corresponding values for a;;, i =j, such that correlations do not exceed unity in absolute 


value. 


28.10. It must be remembered that we are regarding a; as the same as aj; and that 
the product of differential elements in (28.26) contains 4p (p + 1) items, not p*; for there 
are p elements of the form da;; and $p (p — 1) of the form da;;, i = j. The expanded form 
of A? a, however, takes place over i, j from 1 to p, so that any particular term such as 
A*4 q,, occurs twice, once as A**a,, and once as 41? das; except that when i = j the term 
occurs once. For instance, with p — 2 we have 

Aï aj = A” ay; + 241? ay, + A? a. : : + (28.27) 

We can now derive the characteristic function of the Wishart distribution. Ignoring 
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constant factors and writing a single integral sign for summation over all a,;, we have, 


from (28.26)— é. 
2 eer C = 
je | a | 30-279 exp (- > Aï ay) II da = Ao . (28.28) | 


where K is some constant. In this form let us replace AU by A? — Al 0% when i Æj and 


by A9 — Zoi when i —j. Then the resulting integral is the characteristic function of 


the a’s, 6% being the parameter it corresponding to aj. We thus have 


4 (0%) _ | A [inn » 
di 2 gu "4137 1 [iz E ipe low | 
n n n 
H 2 1 
4w—zgn 4Mu.—29m..,. 4 —-6m |. — . (28,29) 
n 


Alp l gip A?» — 1p TE ADD 2 grp 
n n n 


the constant being evaluated by the consideration that d (0) = 1. 


Example 28.1 


Let us apply these results to an examination of the moments of the distribution of 
covariance in the bivariate case. We have 


11 — 422 1 oe =P: 
Atl = A Tp? A1 VEU 
We then find for the c. f. of Qin Qiz, a44— 
Ez 1 2011 cum gre ofa 
1—p? n L—po* mw 
—p 012 1 2922 
|T —p?* m 1—p? n 


We are interested only in the parameter 012 which we will write as 0, putting the others 
equal to zero. We then find— 


1 |f —» 9) 91-in-0 
a f -5)] 


"EET 200 — (1 — p?) 62) -in-n 
: } 


n? 
Taking logarithms and evaluating coefficients of powers of 0, we find for the cumulants 
pope oe 1 
1 m P 

n—1 
k=" (1 +?) 

2(n—1 
Kg — aU p (3 + p3) 

6 (n — 1) 

NET (1 + 6p? + p*), 
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In standard measure the distribution tends to normality as n tends to infinity. But for 
finite n we have 

4  p*(3--p?9 

-1 cp) 


n 
3 4 
fe sia Sa ETE 
Thus, even when p = 0 our distribution, though symmetrical, is not normal. 
Wishart (1928) has given formulae as far as those of the fourth order for eight or 
fewer variates. 


b= 


Hotelling’s Distribution 

28.11. In the univariate case we can test the significance of a mean by comparing 
it with the estimated standard deviation, the ratio being distributed in “ Student’s ” form 
(or some simple transformation of it if we compare the mean with the actual sample variance 
and not the unbiassed estimator). We proceed to generalise this result. 

We require a single quantity which will serve as a measure of departure of all the means 
4; from the population values which, as usual, we take to be zero. In place of the matrix 
of dispersions, we shall consider the matrix of sums of squares and products (b) where 

n 


by = P (te — Fi) py 5... . o. s (28.80) 
pss 
As usual we take (57) to be the matrix inverse to (b;). Let us now write 
T3 —mw(n—1)0 zz. . : : : . (28.31) 
This is Hotelling’s generalisation of the “Student” ratio t. 
In the simplest case when p = 1 we have 


by, = ns? 
1 
Tat 
and hence 
pa-t las, A ero eren 
8 


so that T becomes equal to the ratio ¢ as required. 


28.12. We have 


2 


= nb. 2 : . . . (28.33) 
n — 
Let us now denote by m;; the sum of squares or products about the origin, so that 
Miz = biy + nz. è . B . . (28.34) 
The determinant of m may be written 
heal gym /n ve dM 
0— bu J- ndi bis + NEE, ..2. Oy + NEE, 
0 b, + Ne, ba + nity... bop onim, 
; i MAS 3 
O  Dy-cnEES bop Nipa... bpp + ni, 
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On subtracting 2; y/n times the first row from the second, and so on, we find— 
[| ma | = 1 Ey Vn... XVI 
ENV n by ... Oy 
tpa bp «.-- Bop 


and on expanding according to the border row and column, 


| my |= |b; | P ae | byl . 0. € . (28.35) 
It follows that 
T? 
[55] —3 | ts — | by | 
T — ncs l eer... (28.38) 
14 | mj; 
n—1 


This is a fundamental equation in the sampling theory of T and we proceed to interpret 
it geometrically. 


28.13. In the case p = 1 we have a single sample space of n dimensions, The numera- 
tor and denominator of (28.36) then reduce to 6,, and m,,—that is to say, the squares of 
distances from the sample-point P, to its projéction on the unit vector whose direction 
cosines are all equal, and from P, to the origin, respectively. The ratio of (28.36) has 
zero dimensions and is in fact the square of the sine of the angle between OP, and the unit 
vector. This is the geometrical approach which gave us “Student’s” distribution in 
Example 10.6 (vol. I, p. 239). 

In the general case let us regard the p n-spaces as superposed in one n-space. The 
points P, .. . P, will lie in a space of p — 1 dimensions, a hyperplane in the n-space. 
Now we may rotate the axis without altering the functions | m; | or | b; | which are easily 
seen to be invariant under orthogonal variate-transformations. If we perform such a 
rotation so as to bring the (p — 1)-space of sample-points into correspondence with p — 1 
co-ordinate dimensions, we see from (28.20) that | m;; | is the square of the content of a 


hyperparallelopiped with one corner at the origin and sides parallel to OP, . . . OP,- 
Now consider a hyperplane perpendicular to the unit vector meeting it, say, in O", 
and let P, . . . Pp be the projections of the points P on to this hyperplane. Then b; 


is the covariance of the co-ordinates P; and P; referred to 0’, and hence | b; | is the square 
of the content of the hyperparallelopiped in the hyperplane. Furthermore, the content 
of this figure bears to that given by | m;; | a ratio equal to the cosine of the angle between 
the unit vector and the hyperplane. Representing this angle by 0, we have 


li 


ERARE S 4 d . (28.37) 


. 28.14. Now if the sample-points P are distributed in the n-space with random 
orientation, the hyperplane which they determine will be distributed randomly in regard 
to the angle which it makes with a fixed vector, and in particular with the unit vector. 
The sampling distribution of 0 is then that of an angle between a fixed vector and a random 
plane. But this, from a slightly differént viewpoint, is precisely the problem of distribution 
which we solved in connection with the multiple correlation coefficient R, for we saw (15.18, 


uj 
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vol. I, p. 381) that R is the sine of the angle between a residual vector represented by a 
variate x, 9, and the space containing other variates Ta... Xp; and in the case when 
the former is independent of the latter we can regard it as fixed. Thus, from (28,37) we 
may write— 


pp ie CERERI SS 


The distribution of R? in the case when the variate concerned is independent of the 
others is 


dF = 1 (1 R2)itn—p-2) (R2)0-9 qm, . . (28.39) 


gir 


2 2 


where we must remember that p is the total number of variates and the variates are measured 
from their means in forming the regression equation, Before substituting (28.38) in this 
expression we must increase p by unity, since in effect we are considering p +- 1 variates 
—the unit vector determining an additional one; and we must also increase n by unity 
because our variation is not restricted to that about the mean, as for multiple correlation, 
With these alterations in (28.39), we have, on substituting for R from (28.38) and a little 


reduction, 
2/(m — 10-2 2 
r 1 (T*/(n — 1) ( T | . — . (28.40) 


n—p p T? \in n—1 
de A EE 


This is the distribution of Hotelling's generalisation of “ Student/s ” ratio. 


28.15. At the end of the chapter we shall see that this is a particular case of a more 
general distribution (28.31). A third and instructive derivation, due to Wilks, is as 


follows :— 
From the manner of derivation of Wishart’s distribution it will be clear that if we 


substitute the moments about the origin a; for those about the mean A the distribution 
is the same, except that there is an extra degree of freedom. The distribution is then 
2p 
dF m 
gipin-1) IT ret) 


exp ( EU a) I da'. 


Putting B/ = 34, we find, on integration, 


giro-1) JT (+ =+) 
f | a! [72-9 exp (— BY ay) II da’ = [Bim . (28.41) 


Now replace n by n + 2r in this expression and divide by the term on the right in (28.41). 
The result is to give us the rth moment of | a’ | as 


n+1—k 
1 , (= +t) 


Peyas $ VM M : . (28.42) 
pe CL a^ |) rer d, (tet) 


A.S.—VOL. II. 
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We may also write the distribution of a; in the form given by our original derivation of 
Wishart’s distribution :— 
in-1) 1 (n—p—3) 
gp 1B? Da] 


j Bi! 
exp (— BY ay) IT da x 181 
ginin-) ur( 


n — mP 


exp (— B" &, à;) II dë. 


Multiply this by | a’ |", integrate, and use (28.42), transferring constant terms to the right 
as in (28.41); then replace n by n + 2s and divide by the constant terms as they were 
before substitution. We find— 


(eee reet 
Hc T 2 . (28.43) 


2 
EL E LE = r(e) r(25*) 


Now put r = —s and note that 


Jo | L5 
Ja’ | | m| 
We find 
TH z Ti (ig 
qu eon 
p(t +05) 


ey LEN ely . . . . (28.44) 


Now the function on the right is the sth moment of 


D oT] 
(taa) 
which is uniquely determined by its moments. This, then, is the distribution of the ratio 


[5| 


pz and on substitution in terms of T' from (28.36) brings us back to the distribution of 


gqin-»-2) ql — g)io-27 dx . ^ . (28.45) 


(28.40). Ineidentally this method gives us one more derivation of the distribution of 
multiple correlations and correlation ratios when the respective variates are independent. 


Significance of a Set of Means 


28.16. Suppose that we have a set of b samples with numbers n, . . . np, each 
from a p-variate population. Let us also suppose that the populations have the same 
dispersion matrix but different means, that of the jth variate in the Ith sample being y; q 
We proceed to derive a criterion for testing the means simultaneously. Our result is a 
generalisation of the testing of k means in normal samples, and we shall obtain it by applying 
the same method, namely by using the likelihood criterion 


124 (o max.) 
Pı (Q max.) 
as given in equation (26.64). Here œ is the domain for which all the means of the jth 
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variate have a common value ui and Q that for which they have the more general values 


Hj o: : 
Let b;;j be the function 5; for the Ith sample (J = 1, 2,... k) and &;» the mean 


of the ith variate in that sample. Put 


k 
O NE pae - (28.46) 
l=1 


where, of course, 
nm » 
bjo = P» o —£o)um—£Xp.-. . F . (28.47) 


t=1 


Put, for the functions of the pooled samples, 


T 1 
DEPT a un cen UN SER la hae pitty 
Ti 5 tru im Ti V (i) (28.48) 
b; = eo n © — Ži) (Xu — €). . NS XE ee (28740) 
If then 
mag = a (taw — Hi n) (in — yw) E TELE) 


the likelihood of all samples together is 
c|A |" exp (3 Z (n AY my») h . : d . (28.51) 
1 


where c is a constant. 
Taking logarithms and differentiating, we have for the maximum value equations 


typified by 
Io A” { (tu — Hew) + Gr — yw) ) — 9, 


which reduce to 


fq =a + + 0 + ee (28.52) 


The maximum likelihood values of the m’s are then given by 
fi n = by (UN 
rs ' diy A 
Furthermore, the values of Â” are then given by the inverse of the matrix (; hj). and the 


exponent of (28.51) becomes 
— 4n X (A by y) = — $nk. A 2 -. fs (28:58) 


We then find 
ce-ink 
pı (2 max.) = 1 TY 1 3 Ü . (28.54) 
EU 
In a similar way it will be found that 
e-ink 
po (œ max.) = a m E Er OD OR 
= «| 
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Hence : 
1 n 
m 
= | 1 iin 
ined 
and we may write 
[st] un 
- n” | ul og 
= = = 28.56 
L = An | 1 | A (28.56) 
nt 


and take Z as our criterion. 


28.17. The distribution of L for general b is not easily expressible, but we may 
3 s 1 A 
determine its moments by the method employed in 28.15. The functions = b; are dis- 


tributed in Wishart's form and their moments accordingly given by equations of the type 
(28.42) with n replaced by n — 1, namely, : 


r(* —m +r) 
" mr p 2 
DEEP m) j 


2 


Now each b; q is distributed in Wishart’s form, and therefore their sum is so distributed 
(cf. Exercise 28.3). In the manner of 28.15—we omit the details—it is found that 


n —m n—m--1—k 
(1 by | T "ee )r( E ; tr) 5 
i( )- II TTE . —. (08.88) 
| by | m= 'r(=” a er)r (=E 
2 


where we now use m as an index of summation, reserving k for the number of samples. 
This gives us the moments of L. 
In the case k = 2 we have 


r(? = r(ez2-i +r) E 


Me = . 
n—1 n —5»—1 
r( 2 +r)r( =) 


and hence the distribution of L is in the form 
1 
af == —— o [A(m—p-3) (1 — [,)¥-2) gr. 
E = -»-1 EE E ( ) dL : . (28.60) 
Eo 


In the case k = 3 we find 
1 n— 2 n 1 = = 
r(& p n—p—2 
tla) PCRS) (9 tr) (AB? e) 


"Cr ey Cr ry 
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Which, in virtue of the relation 
TG*prG4)-Y2TOs3 1 
becomes 
-_ TP (n—2)T(n—p — 2 + 2r) 


P T(n—3-E ol E 3 0 (PS) 


These are the moments of the distribution 


1 
dns Tepe) vir ten VIP QU Mum 


a rather unusual form. The results are due to Wilks. 


28.18. The line of generalisation of univariate analysis will now probably be clear. 
Corresponding to most of our results for a single variate there will be a generalised result 
for p variates ; and, in fact, if we like to regard the p-variate as a vector we can often draw 
direct analogies between results for vectors and those for the (univariate) scalar. It is 
of special interest to observe that the role played by the variance in univariate theory is 
taken over by the determinant of the dispersion matrix in multivariate theory. 

Up to this point we have generalised the distribution of variance (the 7?-distribution) 
into Wishart’s form, and the ¢#-distribution into Hotelling's form. 

Other results which suggest themselves for generalisation are regression and variance 
analysis. But in a sense our treatment of regressions is already general, for we have dis- 
cussed the regression of one variate on p — 1 others. Below we shall go further and 
examine the relations between p dependent and q independent variates. In vector lan- 
guage, we consider the regression of a p-way vector y on a q-way vector x. We have also 
considered the analysis of variance for the bivariate and trivariate case in Chapter 24 
under the title of analysis of covariance, and since the interest lies mainly in the direction 
of regressions we shall not take the subject further here, though it is capable of develop- 
ment and even, perhaps, of application if data become available in sufficient abundance, 
In the remainder of the chapter we shall, in the first instance, deal with an offshoot of 
regression theory which has some interesting taxonomie applications, namely discrimina- 
tory analysis ; and we shall then proceed to the general problem of the relationship between 


two sets of variates. 


Discriminatory Analysis 
28.19. Suppose we have p observations for each of 2n sample members, and that 
each member can have emanated from one of two populations, n to each population. We 
require to find some measurement depending on the p observations which will enable us 
to assign subsequently drawn members correctly to their parent populations with the 
greatest assurance of success. For this purpose we shall find p quantities 4! . . . 2” and 
a discriminant function X related linearly to the variates by 
pacer Ur nM ror pem e 808) 
The criterion on which we shall rely is that the A's must be chosen to maximise the ratio 
of the difference between sample means to the standard deviation within the two classes. 
Any linear function of type (28.63) has variance S, given by 
S = M Mas, EUST ru cie PETES) 
t 
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where, as usual, æ; is the covariance of x; and x; which we assume to be the same for both 
populations. Further, if the difference of the two means of z; is dj, the difference of the 


function X for the two samples is 


D-Ad.-. 3 . . 5 . (28.65) 
We have then to maximise for variation in 4 the function 
t x 
Ds be do? : ; : . (28.66) 
S Xa. 
This gives for each 4 
128 82D 
2ə% Da’ 
leading to-equations typified by 
i Jas = E MEL T . 2507) 
Multiplying by at and summing over i, we have 
Way ak = Sa aik 
CHF A Lm 
- or, replacing k by j, 
2i = Eal aï. ; ; s ^ . (28.68) 
This determines the 4's, except for the constant 7 which can be chosen at will so far as the 
discriminant function is concerned. If c is some constant, we have 
4 — c d; aii. : Fi : $ . (28.69) 


The result also holds if there are n, members in the first sample and n in the second. 
Equation (28.65) remains true, and the rest of the analysis is the same as for equal class- 
numbers. : 


Example 28.2 (from R. A. Fisher, 1936a). 


Measurements were made on fifty specimens of flowers from each of two species of 
iris, setosa and versicolor, found growing in the same colony. Four measurements were 
taken, viz. sepal length, sepal width, petal length, and petal width. We denote them by 
$4, ta, 2, and z, respectively. 

The means of the specimens were (in centimetres) :— 


Variate. Versicolor. x Difference 
r. Setosa. (V—8). 

^ 5-936 5-006 0-930 

v. 2-770 3-428 — 0:658 

Ts 4:260 1:462 2-798 


Ta 1-326 0-246 1-080 


Ki 
/ 


i 
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The sums of squares and products about the means were (in em.?) :— 


vi Ta hc] Ta 
gl 19-1434 9-0356 9.7634 3-2394 
EA 9-0356 11:8658 4-6232 2-4746 
Ys 9-7634 4:6232 12-2978 3-8794 ! 
E^ 3-2394 2-4746 3:8794 2-4604 


The inverse matrix is, in cm. :— 


y x s 94 
1 0-118,7161 — 0:066,8666 — 0:081,6158 0-039,6350 
Ta — 0-066,8666 0:145,2736 0-033,4101 — 0:110,7529 
Tz — 0:081,6158 0-033,4101 0-219,3614 — 0:272,0206 
LA 0-039,6350 — 0-110,7529 — 0:272,0206 0:894,5506 


We need not bother to divide these quantities by n because there is an arbitrary con- 
stant in our discriminant function which absorbs it. The matrices are diagonally sym- 
metric, and it is not always necessary to write out the values below the diagonal as we 
have done here. 

From (28.69), with c = 1, we then find— 


A = — 0:031,1511 4? — — 0:183,9075 
A3— 0-222,1044 At =  0:314,7370. 
If we choose the coefficient of z, to be unity the discriminant function is then 
X = x, + 59037x, — 71299, — 10-1036a,. . h . (28.70) 


The mean of X for versicolor, obtained by substituting the means of the 2s for that species, 
is found to be — 21-4815, and that for setosa is 12-3345. The difference is thus 33-816 cm. 
Let us compare this with its standard error to see whether it is significant of real differences 
in the values of X for the two species. 
From the matrix of sums of squares and products we find 

N var X = Ai M aj, = 1085-5522, 
where the 4's are, of course, the coefficients in (28.70). NV here is the number of degrees 
of freedom. of the estimate of the variance. "There are 100 members altogether, with 99. 
degrees of freedom, but we have eliminated four corresponding to the means of the four 
variates. We therefore take N to be 99 — 4 = 95, and find 

var X = 11-4269, 


This is the variance of a single value. That of the difference of the two means of 50 values 
is obtained by division by 25 and is thus 0:4571, the corresponding standard error being 
0-676. 

The observed difference of means, viz. 33-816, is about 50 times this amount, and 
there is thus a real difference in the values of X for the two species. In other words the 
discriminant function is a good one. It is best among the linear functions of the z's because 
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we have chosen it so that the difference of two values, divided by their estimated standard 
error, shall be the greatest possible. To use the function we should, given a flower of 
doubtful species, calculate X for it and assign it to one species or the other according as 
X were nearer to the mean value of X for one species or the other. If, of course, 
the observed value differed from the mean values by more than twice the standard error 
of each, we should begin to doubt whether it belonged to either. 

The analysis may be put in rather a different way. Suppose we analyse the variation 
of X between and within species. The sum of squares between species in the 50 x 2 
classification is £ id 

50 (X, — X)* + (Z: — X)*}, 

where X,, X, are the respective means and X the mean of the whole. This reduces to 25D?. 
The sum of squares within classes is 1085-55 with 95 d.f., as found above, and we have— 


Sum of Squares. | d.f. 

Between species . . . . à 28,588-05 4 

Within species . . . . - | 1,085:55 95 
| E 

Torars . . . .| 29007940 | 99 


Our method of selecting the discriminant function has been such as to minimise the sum 
of squares within species and, for constant total, to maximise the sum between species, 
and hence to minimise the ratio of the latter to the former. For the moment we cannot 
assume that this ratio may be tested in the z-distribution in the usual way, though we shall 
see presently that this is so. 


28.20. The relationship of discriminatory analysis for two classes and the theory of 
regression may be brought out by introducing a formal variate y for the classes. If there 
are n, members in one class and n, in the other we shall assign the values 


Ng — ny 
> 
f, + Na Th, + Ns 


to the y-variate for the two classes respectively. The mean of y for the whole sample is 
then zero and the sum of squares is 


Ny Ne 


ETT = ¢, say. : T . 5 . (28.71) 


Considering now 
Y-—z. s : 5 . . (28.72) 


as a regression equation, we find for the coefficients 4 
Z(Yz) — HE (we) = 0, 


or X (Ya;) — i1 aj = 0. e 73), 
NS (Xz) " t . (28.73) 


E (Yz) = z = me) z Z, (2), 
1 a 
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where the suffixes of the X's relate to the first and second classes, 


IE ENS 
A m + m ( (25). (%;)2} 
= Ld; 
Thus td; =F ay, : à * > : : . (28.74) 


which is another way of writing (28.69) with a particular value for the constant c, 


28.21. Pursuing the analogy with regression analysis further, we see that since 


2 (Y?) =¢ 
and ?2 (Yz,) = Cd; 
we may analyse the sums of squares as— 
Sums of squares. d.f. 
C H d, p 
¢ (1 — 4d) nı Hna —p—1 
Ü Ny d m», — 1 3 : . (28.75) 


as for a regression line. If R is the multiple regression of Y on the w-variates, 
fee 3 aee coetu E MN TESTO) 


In ordinary regression analysis we may test the ratio R*/(1 — R*), multiplied by 
suitable constants, in the z-distribution ; but this depends on the assumption that the 
dependent variate y is normal for any fixed z's. Here we have the case when the dependent 
variate is fixed but the z's are normal. The test still holds in such a case, the reason being 
the kind of duality we noted in 28.14 in arriving at Hotelling's distribution. "The distri- 
bution of angles between a fixed plane and a random vector is the same as that between 
a fixed vector and a random plane. Consequently the table of (28.75) can be regarded 
as an analysis of variance and the z-test applied. 


28.22. We may extend the discriminant function to the case when the property to 
be discriminated is not, as above, a matter of allocation to one of two classes, but to several 
which may in particular be determined by certain values of a continuous variate. If we 
have various measurements of p z-variates corresponding to values of a y-variate, we may 
form the regression of y on the z's and use the resulting function as a discriminator. As 
in the case of dichotomy, the regression will maximise the difference between classes as 
compared with intra-class variation; and its significance may be tested in much the 


same way. 


Example 28.3 (from M. M. Barnard, 1935). 

An investigation was undertaken into the changes taking place over time of the char- 
acteristics of certain Egyptian skulls. There were four sets of skulls, known to be from 
Late Predynastic, Sixth to Twelfth, Twelfth to Thirteenth and Ptolemaic dynasties respect- 

* 
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ively, and the relative time-intervals were taken to be in the proportions 2 : 1 : 2, so that 
the values of ¢ for the four periods may be taken to be respectively — 5, — 1, + 1, + 5. 
For the skulls four measurements were selected : 


25, basi-alveolar length ; 
#, nasal height ; 

£, maximum breadth ; 
zı, basi-bregmatic height. 


It is required to find a function 
X =A z, 4-43 2, + 45 2, tÀ’ a, 


which will best discriminate between skulls belonging to different periods. 
The means of the series were as follows, the sample numbers also being shown :— 


; Series I Series II Series III Series IV 

Variate. (nj = 91). | (n, = 162). (ng = 70). (n, = 75). 
L4 

i 133-582,418 134-265,432 134-371,429 135-306,667 

Ta 98-307,692 96-462,963 95-857,143 95-040,000 

[A 50-835,165 51:148,148 50-100,000 52-093,333 

Va 133-000,000 134-882,716 133-642,857 131-466,667 


The sums of squares and products about the means are— 


v Ta L^ 2. 
[^ 9661-997,470 445-573,301 1130-623,900 2148-584,219 
Ta on 9073-115,027 1239-221,990 2255-812,722 
EA v ae 3938-320,351 1271-054,662 
P we Be 8741-508,829 | 
The mean value of t, #, for the 398 observations is — 0:432,161, and the values of ¢ —i de 


for the four series are accordingly 


— 467,839; — 0-567,839 ; 1-432,161; 5-432,161, 
The sums Xx, (t — 1) are respectively 


LA 718-762,86 
Ta — 1407-260,75 
ES 410-101,94 
LA — 733-668,32 


and finally, X (t — i): = 4307-668,32. 

We could obtain the coefficients 4 from the reciprocal of the matrix above on the lines 
of the previous example. It is also instructive to observe, from the analogy with regres- 
sions, that instead of that matrix we may use the matrix (depending on one extra degree 
of freedom, 395 in all) obtained by adding to the sums of squares the regressions on time. 


| 
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For instance, instead of 9661-997,470 we have 9661:997,470 + (718-762,86)?/4307-668,32, 
The resulting matrix is 


Yi Ta L^ L^ 
EX 9781:927,828 210-762,489 1199-052,135 2026-206,952 
ty zu 9532-849,476 1105-246,827 2405-414,318 
ES 21 3977-363,203 1201-230,304 
P ri 8866-382,928 


The reciprocal of this is (units = 10-9)— 


ey Ti v Ta 
D 110-368,975 6-938,481 — 28:145,236 — 23-361,935 
[n ET 115:693,529 — 24-948,984 — 30-767,069 
T. eee 273-988,409 — 23-666,591 
a seis 129-990,069 


The resulting values of 4 are 


Mt = 0:075,156,739, 4? = — 0-145,490,050, 
48 = 0-144,600,884, 45 = — 0-078,538,419 


and these, or constant multiples of them, give us the constants in the discriminant function 
which will best enable us to assign a skull to the correct period by measurements of the 
four specified variates. 

In this analysis we have 398 members, but of the 397 d.f. we have discarded two with 
the general mean. The d.f. of the sum 4307-6683 = X (t — é)* are 395, of which four are 
attributable to regressions on the other variates. For the contribution of these four we 


have 
A! x 718:762,86 + etc. = 375:6657. 


The analysis of variance is thus— 


Sum of Squares. d.f. Quotient. 
Regression . . + + 375-6657 4 eile 
Ronde EG EI 3932-0026 391 10-0563 
TOTALS. . . 4307:6683 395 


The analogy of the discriminant function with regressions noted above may be used 
to provide standard errors of the coefficients 4. In our present case the variance of A 
is obtained by multiplying the remainder quotient, viz. 10-0563, by the term corresponding 
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to aj in the reciprocal matrix of sums of squares of the a^s, namely 110-368,975 x 10-5. 
This gives a standard error of 0-0333. We obtain finally 
B= 0:0752 + 0-0333 


A? = — 0:1455 + 0-0341 
B= 0:1446 + 0-0525 
At = — 0:0785 + 0-0362. 


AII coefficients exceed twice their standard error, and hence all the variates are useful in 
discriminating between skulls of different periods. 

I am indebted to Dr. M. S. Bartlett for the calculations of this example. His results 
differ from those reached by Miss Barnard in her original investigation since she took an 
unweighted regression of the variates with time, whereas he has weighted the values 
according to sample numbers. He also notes that the significance of the results has been 
tested above on the basis of variability within classes, but that a fuller analysis of the means, 
bringing back the two degrees of freedom discarded, reveals further differences between the 
series. Thus, though the discriminant function will efficiently sort the series examined in 
relation to their periods, we must be cautious about associating the observed differences 
with the time-changes. 


Canonical Correlations 
28.23. We now turn to consider the general theory of the relations between two 


sets of variates 2, . . . x, and Tp+ı - X544, Where we suppose that p <q. Following 
Hotelling (19365), we shall show that in general there can be found linear transformations 
to variates & . .. y Pe £j,q Such that 


(a) all the £s have unit variance and zero mean ; 
(b) any & in the p-group is independent of the other £'s in that group ; 
(c) any & in the g-group is independent of the other £s in that group ; 
(d) the correlation between any £ in the p-group and any £ in the q-group is zero except 
for p correlations p, . . . Pp» Which may be taken to be the correlations between 
& and £,,5, & and £s, .. . £j and &,. 
The variates ¢ are then said to be canonical variates and the p's canonical correlations. 
This part of our work is, fundamentally, the reduction of two quadratic forms and an 
associated bilinear form to canonical types and does not depend on the distribution laws 
of the variates. Furthermore, the reduction can. be carried out either on the population 
or on the sample. In the latter case it will yield sample canonical correlations which may 
be written 7, . . . r, and regarded as sample-values of the parent p's. 
We will suppose that our variates z have zero means and dispersions denoted by oy, 
where, for the time being, we use ø to denote a variance or covariance instead of the more 
usual c?. Those dispersions in the p-group we denote by Greek affixes: c,;, and those 


in the q-group by Roman affixes : 9j. For a covariance of a p-variate with a q-variate 
we write one Greek and one Roman affix: [S 


Consider now a particular pair of variates given by 
&=c*x,, a=1,...p 
7 =de ty [um 
If their variances are unity we have 


c* o, —1 
meu. E NE 0 (20:78) 


EE 2:77) 
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We will also impose the condition that their correlation R is stationary for variations in 
the coefficients ¢ and d, i.e. that 


R=" doa = stationary. ‘ . . + (28,79) 
Equations (28.78) and (28.79) then require an unconditioned stationary value of 
c d^ a,, — M c* c a,, — In d^ dh ony , ; . (28,80) 
where 2 and w are undetermined multipliers. This leads to 


€ Ora — H d gay = 0 
d^ au — AC es EU + (28,81) 


Multiplying the first equation oy d^ and summing and the second by c* and summing, 
we have, in virtue of (28.78) and (28.79), 
Reh, . ` . À . + (28.82) 


Equations (28.81) will then be soluble for the p +q unknowns c and d if the determinant 
of their array vanishes, that is if, writing 4 for the constants 4 and 2, i 


— ion S S AOS 0, p41 AP S 0, p4q 

— hogy sus" WW em Aap Oy, p41 ST UD Oi zo 
Opty c+ 60 Tp p — À05 41, 941 E AO gt peg 
Cpka ake te re tap ~Mptantt 85. = 40544, pq 


: + (28.83) 
an equation determining 4. Before studying it further we will throw the equation into 
a somewhat different form. 


28.24. We may write (28.83) as 


m0, v V S ENA) 


Multiplying the first p rows by — 4 and dividing the last q columns by —A we find the 
equivalent form 


0... ie ube (98.88) 


is ! gj 
Writing, in conformity with our usual notation, (o') for the matrix inverse to (ay) and 
remembering that 

J of au, = A, 
let us multiply (28.85) on the left by 


si eth dec SS BO) 
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The product of determinants is then 


| 2? op) — Uis oi 9yk 


a Gig o aj 
2 op, — Cip oE Oy | 0 
at gi ô} 
which gives 
(— 2P | AP op, — Cig 0% oy | = 0, : 5 e (28.87) 


a determinant with p rows and columns multiplied by a power of A, 


28.25. Returning now to our original problem, we see that if a simple root of (28.83) 
is substituted in (28.81) the c's and d’s are determinate, except of course that they may be 
pee by — cand — d. For a root of multiplicity m they are determinate except for 

— 1 assignable constants, a result we take without proof from the theory of algebraic 
um (reference may be made to Hotelling's paper for details). 

From (28.87) we see that the equation in 4 has p + q roots. It cannot have fewer, 
for the coefficient of the highest power of å in (28.83) is the product of two principal minors 
which do not vanish unless the variates are linearly dependent, a case which we exclude 
from the discussion. Of these p + q roots q — p are zero. The remaining 2p can be 
grouped in pairs, each of which is the negative of the other. There are thus roots which 
we may write + p;,... + pp. We choose as the roots those which are not negative and 
proceed to prove that they are the canonical correlations as we have defined them. That 
they are, in fact, correlations follows from (28.82). 

Suppose we have a root p, and determine the corresponding constants c, and d, and 
hence a pair of variates £, and 7,. Then we have, from (28.81), 


6 Cza = p, d) Cab IR Q 
di dlc Ei re. + (28.88) 


Similar equations obtain for a second pair, say £j and n, Between these four variates 
there are six correlations, two of which are p, and p;. We wish to show that the other 
four vanish. They are 


E (E, Es) = c Cf ong E (n, ns) = d? d) og, 
E (E, 15) = db om ` E(51,) —c$d$o5. . . . (28.89) 


Multiply the first of (28.88) by d? and sum. Using (28.89), we have 

E (E, n) = p, E (n, h). ; : : . (28.90) 
Similarly from the second of (28.88) multiplied by cg, 

E (&n,) =p, H(é,&). . 4 3 $ . (28.91) 
Interchanging y and ô we find from (28.90) and (28.91) 


p, E (n, 1) = p, E (E, Ej). E E97 (28.92) 
Equally, again interchanging y and à in (28.92) we have 
Ps E (ny n) = p, E (E, &). Gare eee 1: (28.93) 


i 
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Thus, unless p? = p?, 
ify E(E,&) = E (5, 9) = 0. i ; i . (28.94) 
J It follows from (28.90) and (28.91) that the other correlations also vanish. 
We have only to round off the proof by showing that if p is a root of multiplicity m 
the property still holds. This follows from the consideration that we may then choose 
our c’s and d’s to obey certain orthogonal conditions ensuring that 


E (È, és) + E (n, 15) = 0. epe reine (28:05) 


Tt will then follow from (28.92) that each expectation vanishes unless p, = p, — 0; and 
even in this case, (28.91) and (28.92) show that two expectations vanish, and we may then 
choose our assignable constants so that the others vanish. 


28.26. When the variates are put into canonical form the dispersion matrix reduces to 


TNO" WE OON 0: MATO AT SUO 
UT ROSE TNR Ce i e TREE T 

d M m Rr y 
pi Tm hax TI 9 
0. e SEM OO Tn CONO TR NES Bing ERIT, 
LIC MID ATURSITE T SERIE E eme scd 


0r707 €, 07 0, OESTE On S qom 
with a determinant equal to 4 
Q — pt) (1 -—p)...(1—p) 


Example 28.4 (from Hotelling, 19365, dealing with data of T. L. Kelley). 


140 seventh-grade school children were given four tests in (a) reading speed, (5) reading 
power, (c) arithmetic speed, and (d) arithmetic power. It is required to find canonical 
variates for the two reading tests and the two arithmetic tests. 


* The correlations between the variates were— . 
f 
ey | Ta | V3 | Ta 
| x | 
i 1-0000 0-6328 0-2412 0:0586 
Ta 0-6328 1-0000 — 0:0553 0:0655 
Ws 0-2412 — 0:0553 1-0000 0:4248 
Ha 0-0586 0:0655 0:4248 1:0000 


The determinant (28.83) becomes 


— å — 0:6328} 0-2412 0-0586 

— 0-6328 —4 — 0-0553 0-0655 S 
0-2412 — 0-0553 —4 — 0-4248A 
0:0586 0:0655 — 0-42484 —4 
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or | 

0-491,370 44 — 0-078,803,4 2? + 0-000,362,490 = 0, 1 | 
giving i A? = 0:155,635 or 0-004,740 | 
with À — 0:3945 or 0-0688. * 


To find the transformed variates themselves we use (28.81). For instance, with the 
root 0:3945 for u, we have 


c! + 0:6328 c? — 0:6114 d! — 0-1485 d? = 0 

0:6328 c! + c? + 0:1402 d! — 0-1660 d? = 0 
— 0-6114 ct + 0-1402 c? + d! + 0-4248 d? = 0 
— 0-1485 c! — 0-1660 c? + 0-4248 d! + d=0 


The last equation is linearly dependent on the other three, so adds nothing. In the other 
three we solve for the ratios of c's and d’s, finding 


cl : c? : d! : d? = — 2-7772 : 2-2655 : — 2-4404: 1. 
Thus the transformed variates are 
k, ét = — 2:7772 x, + 2-2655 x, 
ka nt = — 2:4404 2, + Zo 


where k, and k, may bechosen so that the variances of ét and 7! are unity, if desired. Similar 
equations with the root 0-0688 will give us a further pair of canonical co-ordinates. Those 
we have worked out have the maximum correlation, the other pair having the minimum 
and therefore being of less interest. 


28.27. In practical cases it is of some importance to know whether an observed 
canonical correlation r,, say, is significant of real correlation. The problem has been solved 
for large samples but not completely for small samples. "We shall conclude this chapter 
with a short account of the main results which have been reached. 

For large samples we shall show that, for the standard error of a canonical correlation, 


varr = ‘a — r2)? : : . : . (28.97) 


a remarkable result showing that the variance is the same as for a product-moment 
coefficient. 


Denoting as usual the sample covariance by a;; we have to the first order 


E (a) = oj. . ; J x . . (28.98) 
To the same order, x i^ ( 


1 
X (a;; aja) = ad E [z (in Tja) "i Tip) }. 


If « z^ f the sums on the right are independent, and there are n (n — 1) such cases. When 
a = B we have n terms such as 


E (vj, Xy, Cha Ly) = 0; 8; + Fy OR TOC o > . (28.99) 


E follows from the consideration that the characteristic function of the multivariate normal 
orm is 

exp ( —1o t t) 
(cf. 15.12, vol. I, p. 376). 
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Hence we have 


n(n—1 n 
E (aj; au) = PU m On 4 i (G45 Of + Ou Op, + Cir Oj) 


1 
= Oj 04 + a Cu Oj + Oik Oy). . E 5 ? . (28.100) 


Thus 
E (da; dag) = E (aj; ju) — Oij Or 
1 

= g (OU Tik emen. . ; AEE IE) 

Now for any canonical correlation r we have 
Pa, =1 di di a;, — 1 
aß , 
7 — c dia, bi I ; . 7 . (28.102) 


If now we define for the sampling deviations in c’s and d’s corresponding to deviations 
in the a’s, 


dc* 
Act = 5 Fa, Atm + + s (88,108) 
we find 
2a, C* Ac? + c* cf Aa, = 0 l 
2 dy d^ Ad" + d^ d Ady, = 0 x eR Ihe TESTA 
Ar, = a, c* Ad? + ay, d? Act 4- c* d* Aag] 


Without loss of generality we may now suppose the variates canonical and hence put 


€ —1,02—035—..,-—c0 =0,d'=1, d? =... =d*=0. We then find— 
24c! + Aa,, = 0, 24d + Ady +1, 541 = 0 h 
Ar, = ri Ad! F r Act + La, oni EAD Eg TOB) 


Substituting from the first two in the third of these equations we have 

Ary = 46,54; — Fri (Aaa, + 40541 541. -— e . (28.100) . 
Similar equations apply for any other simple root, e.g. 

Arg = Adz, p42 — $ra (Adee + 485,45, p42): 
Squaring these equations and substituting from (28.101) we find 

nE (Ar,)? = (1 — rẹ)? 
E (Ary, Ara) = 0. 

Tt follows that 


1 
=- (l = p)? 
varr, =- ( " p Ree O 
cov (11, ra) = 0 


to our order of approximation. 


28.28. Equation (28.107) applies to a simple non-vanishing correlation. If a canon- 
ical correlation vanishes and p = gq, the result holds, with the qualification that sample 
values of r near the zero root must be allowed to have positive or negative values, or alter- 
natively that the distribution of r is that of absolute values of a normal variate (cf. Exercise 
28.7). Ifp = 2,q> 2azero root is of multiplicity q at least. In this case, if it has exactly 
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multiplicity q, nr? is distributed as y? with q — 1 degrees of freedom. For the proof of 
this result see Hotelling (19365). 

There is another rather curious difficulty in testing the significance of roots of the 
equation giving the canonical correlations, namely, that if several roots exist it is not pos- 
sible to relate them with certainty to specified parent correlations—any one might have 
arisen from any one of the parent values. This is not serious for large samples when the 
roots are distinct, since the sample values cluster closely round the parent values; but 
for small samples or canonical correlations in the parent which are close together it presents 
a theoretical problem of a novel kind. See Hotelling (19365) and Bartlett (1941) on 
this point. 


28.29. We proceed to find the sampling distribution of canonical correlations in the 
case when the parent values are all zero and the p-variates and g-variates accordingly 
independent. * 

Reverting to equation (28.87) in the form appropriate to samples, we have 


| 22 ag, — ag a ay, | = 0. å : ^ . (28.108) 

We write 
thy = Qip a'* a, E á : : . (28.109) 
and Gy, = Z3, + ty, : : ` : . (28.110) 


so that (28.108) becomes 

| A? (zay + tay) — t5, | = 0. 5 . . . (28.111) 
The significance of this device is that z and are distributed independently in Wishart’s 
form, as we now proceed to show. 


One instruetive way of looking at the problem is to consider the regression of the 
p-way vector y on a q-way vector x. Corresponding to the univariate equation 


y — bx +e, 4 3 j : . (28.112) 
where e is a residual, we have 
Urt bb xum d : s ; . (28.113) 
where the b's are given by minimising the sum of n values 
: E (Ya — biu)? 
namely, by 
E (Ya %;) — OEE (v, 2) = 0 
or, in our notation for canonical variates, 
a; — DE ay; = 0, 
which yiélds 
bk = a; aki. 
We may analyse the variance of y in the form— 
Z (ya) = Z (bi a, + tja)? 
De bs Ope F (a are: 
corresponding to the univariate case 
Z (y?) = b? X (x?) + Z (e?), 


and the two constituents on the right in (28.115) are independent, just as in the univariate 
case. Tbis may be shown by a direct extension of 22.19. 


c west) 


.  . (28.115) 


DS aa 
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Furthermore, if we wish to find the linear function of the y's, say A*y,, which has 
maximum correlation with the v's, we have to maximise the ratio 


ZG? _ HU: Bay 0. 


O FP a . (28.116) 
This is equivalent to maximising unconditionally 
4* A (bi b] aj; — r? a) = 0, 
giving, for r?, the equation— 
| bb} ay — r? ag | = 0. 5 2 : . (28.117) 
Now in virtue of (28.114) this reduces to 
| 7? &45 — Qij a,,, a"! agp a” | — 0 
or r 
| 7? aup — anj a?! agp | = 0, 5 : : . (28.118) 


which is equivalent to (28.108) with a slight change of notation. This must be 80, for 
we arrived at both equations on essentially the same assumptions. Now we see that the 
term on the right in the determinant of (28.1 18) is the first item on the right of the variance 
analysis given by (28.115), and the other term in the determinant is the sum A (y?) of the 
analysis. It follows that z and t of (28.111) are independent, for they are the constituent 
items of the analysis. Furthermore, the z's will be distributed as sums of Squares or pro- 
ducts about the means with n — q degrees of freedom, that is in Wishart/s form; and 
similarly the ¢’s are distributed as q sums of squares or products about the origin, i.e. in 
Wishart's form with n = q + 1. 


28.30. Without loss of generality we may take the parent variances to be unity ; 
the covariances are zero by hypothesis. The joint distribution of z and £ is then, from 


(28.26), 


p. 

| £ [E7279 | z |i 7272-9. exp f £ AC + si) prati 
dF = st ~~ (28.119) 
QP (n+1) miv w-1 [T C E =) r(*-1—1) 


i=1 2 


In the determinant 
142 (z +4) —t| =0 


put u = 2° and let the roots in u be arranged in descending order of magnitude. Consider 
the distribution for a given value of t;; and z; which in particular we take to be ôy. Let us 
choose new variates from a set £j; obeying the orthogonality conditions— 


n 


Dy (£x. en) = Ôij 


k=l 
z0if $5; 
CSI DNE NL NEUE (289190) 
Make the transformation tj = z (Sie Eje Ur) A $ 7 . (28.121) 
lg; + Ziy = Z (Eg £y) = 05. : a . . (28.122) 
k 


AT 
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Instead of the }p(p + 1) values of t we will take the p values of u and $p(p — 1) of the 
£s as our new variates, We have 


p 
| t| = | Eix Sem | = I uy . . . . . (28.123) 


p 
|z| = lénin (l iyw |= A) e» (28.124) 


and have only to consider the Jacobian. This is clearly of degree 3p (p — 1) in u, for the 
Jacobian of t and z + t is the same as that of ¢ and z and only ¢ contributes factors in u 

» inthe former. Furthermore, every term (u; — uj), i <j is a factor of J. For consider 
u, — us and let us take as our ¢-variates those for which j> i. Then to satisfy the con- 
ditions on the others, derivable from (28.120), 


@ = 
a5; > 6n Sw) =0, 
An fa "NET 


we must have Tn Eo aye 


ale NOR 
Waco eh 
EN 
OS Te SE, 
whence 0h. — 054 (Eix Six Ux) 


= — att (uy — Us). . ; ; z . (28.125) 
Ëu ; 


Thus every term (u; — uj) occurs in J, and there can be no further factors in u because 
the power in u is $p (p — 1). 
Substituting in (28.119) we have, integrating out the £-variates, 


p 
dF =c II (uj 4-»-9 (1 — u) n7e-2723 TI (u, — uj) IT du . (28.126) 
i=l 
where 
k 


EUER 


The constant k arises from terms involving n and p in the original density and from the 
Jacobian. It therefore does not involve q and may be written k (n, p). Evaluation of 


k by direct integration is a matter of some difficulty, but we may find it indirectly 
as follows :— 


In (28.126), if we increase g and n by 2s, the corresponding value of c is 
k (n. + 2s, p) 


Ey) 


"The only other term in (28.126) which is affected is that in 77 (u) and, with the original - 


. (28.127) 


abe 


. (28.128) 


* 
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c of (28.127), the integral of the distribution so modified would give us the moment of 
order s of II (u), namely of | t|. "This may be found in the manner of 28.15 to be 


r(t) r (+>) 
2 


2 


"(EE (mzn 3 . (28.129) 
z it! 
rej rex) 
(see Exercise 28.11). It follows that 
(= 2s — ‘) 
k(n + 28, p) yy VEAN en e 231130) 
k (n, p) r(* = ) 
2 


whence 


k (n, p) -ur(* gyro ENR MEC (SSIT3T) 
It remains to evaluate f (p). To do so we make the substitution in (28.120) 
u =, 
letting n tend to infinity. Our distribution becomes 
d (a-p-1) 
ap 2 1G) Uy P p Ew) I o o) dm .  . (28,132) 
r(2 +2-% 
2 
This may be reduced by successive substitutions of the type 
Vi = UU, Vy =W + 1, vean 


and choosing q at each stage so that the term in /7 (v) vanishes (as we may, since the result 
is independent of g). On integration for v,, then repeating the process, and so on, we find 


f) UrD(pc1-i,|| 
p+2—i 2ip(p-i) v 
ari 


Using the relation 
T (x) P (x 4-3) = 277" ym T (22), 
we have x 
SF) e E E A a A AOE) 


pctl—iy 
ore) 
Thus our distribution is finally 
dF = c II (ui 4-77! (1 — u) "7774-9 ) IT (u; — uj) I1 du, . (28.134) 


r(*— i 
» JI : (28.135) 
c= n? = — a " i; 
eel q+1—i PERIE a gs 
1 r( ) TP ( 3 p 3 
a remarkable form obtained in the general case by Fisher (19395), P. L. Hsu (19395), and 
Roy (1939). . 


where 
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We have supposed throughout that q > p. In the contrary case we reverse the roles 
of q and p and hence merely have to interchange p and q in (28.134) and (28.135). 


28.31. Let us consider some special cases. When g — 1 the distribution becomes 


A 
dF = io :: T u} 0-9? (1 — u)! ™-P-9) du,, . (28.136 
E R) 
2 2 


the canonical correlation is then the multiple correlation between the q-v. 
p-variates; and as the former is measured from its mean there is one fewer degree of 
freedom, i.e. n is replaced by n — 1. 

When q = 2 we have 


Eee 
EP) Ore 


riate and the 


(uy ts) (0-9. ((1—u,) (1 —u,) } 72-9 


X (uy — Uz) du, dus.  . (28.137) 


Writing 
(1 — u) (1 — w) =v, 
Uy + Us =W, 
we find 
: dF = C SE) ` (v — 1 + w)tw-3) ph (n—P-4) dydw, . (28,138) 


4r (n — p — 2) T (p — 1) 
For given v the limits of w are 1 — v and 2 (1 — 4/7), and integrating for w we find 


= I'(n — 2) 2 3 e. 
eg rp pal (WU de 
or, for 4/v, 
i : f 
aF E= Yu)! (yo Te d4/v, 5 : . (28.139) 


B (n — p — 2, p) 
a result due to Wilks—cf. equation (28.62). 


28.32. The distribution of the w’s does not immediately provide a test of significance 
of the canonical correlations, except when there is only one of them. The criterion 
v — II (1 — u) fi E D r . (28.140) 
is sometimes useful in the general case for testing simultaneously the departure of the 
ws from zero. Cf. Exercises 28.11 and 28.12. 


NOTES AND REFERENCES 


Among earlier papers in which various aspects of the multivariate problem began to 
be studied, reference may be made to Karl Pearson (1926b) on the “ coefficient of racial 
likeness " and Ragnar Frisch (1929), who independently arrived at the dispersion matrix 
and proposed to call its determinant in standard measure the “scatterance ". Reference 
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to the papers by Wishart (1928), Wishart and Bartlett (1933c) and Hotelling (1931) on the 
generalised produet-moment distribution and the generalised “Student " ratio has been 
made in the text. 

In more recent literature three lines of development are discernible ;— 

(a) American writers have developed the theory of canonical correlation and multiple 
analysis mainly on algebraic and analytical lines, See Hotelling (1933, 19365), Wilks. 
(19326, 1934, 19355, 1935c, 1936, 1943), Girshik (1939), and Madow (1938). 

(b) English schools have investigated the theory of discriminant functions and devel- 
oped the sampling theory of canonical roots. See R. A. Fisher (19364; b, 1938c, 19395, 
1940d), P. L. Hsu (1938c, 19395, 1941a, c, d), and for illustrative material Martin (1936), 
Barnard (1935), Fairfield Smith (1936) and Wallace and Travers (1938). See also Bartlett 
(19340, 1938c, 1939b, c, 1941), E. S. Pearson and Wilks (19330), Welch (19395), Lawley 
(1938) and Bishop (1939). Simaika (1941) has proved that tests based on Hotelling's T 
and the multiple correlation coefficient are uniformly most powerful in the class depending 
on a single parameter. 

(c) The Indian school, whose contribution has not been referred to in this chapter, 
has developed some interesting work based on what is known as the D'-statistic. See 
Mahalanobis (1930, 19362), Mahalanobis, Bose and Roy (19385), R. C. Bose (1936a), R. C. 
Bose and Roy (1938c), and later papers in Sankhya. If, with two samples from p-variate 
populations, d; is the difference of sample means for the ith variate, the studentised 
D*-statistic is 


DUE ; al d; dj, 


where a? refers to the reciprocal of the sample dispersion matrix. Bose and Roy have 
shown that in normal samples this has the same distribution as one of Fisher's forms for 
the multiple correlation coefficient. The corresponding parameter for the population 


A = 50 89; 


is known as Mahalanobis’s generalised distance. 


EXERCISES 


28.1. In a four-variate normal distribution show that the correlation between the 
covariances a and a, is 
P13 P24 + Pis Pos 


C+ pis) 1 + pi) 
(Wishart, 1928.) 


28.2. For a pair of normal variates with correlation p, show that, defining v by 


we have for the frequency function of v 
= J(n—1) epv 
fi) = LEE (o1 Ka (0) } 
VI r( 
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for v> 0 and a similar expression with — v for v inside curly brackets if v < 0. Here 
K is the Bessel function of second kind with imaginary argument. 


(Wishart and Bartlett, 1933c. See also K. Pearson and others, 1929.) | 


28.3. Show that if k sets of variates a), h — 1... k; i,j =1... p are each 
distributed in Wishart’s form, with sample numbers 7%, . . . nw, then the variates 
É k 


are also distributed in Wishart’s form with n DY (n). (This follows readily from the 


n=l 
characteristic function. It is a generalisation of the additive properties of 7*.) 


28.4. If a sample of n is chosen from a p-variate normal population, the variates 
being grouped into & classes z, 2; . . . pi ppi ++ + Maps eed Vp 
. «$5, consider the function— : 


where ry = 1 and +) is zero if the variates belong to different classes and equals the cor- 
relation ry if they belong to the same class. 
By considering the function 
Am yi 
show that 


Y 
"^ 
m 
LJ 
=| 
ER] 
= 
| 
=. 
+ 
* 
lm 
IN 
zi 
Em 
= 
ei 
ae 


(Wilks, 1935b. The distribution provides a test of the independence of k sets of normal variates.) 


28.5. As a particular case of the last exercise, show that if a single variate x, is 
independent of a second set z, . . . v, then— 


aae) 
mae) 


and hence find the distribution of the multiple correlation coefficient when the parent 
coefficient is zero. 


(Wilks, 1935b.) 


28.6. Show algebraically that Hotelling's 7 is invariant under linear transformations 
of the p variates. 


28.7. 1f the determinantal equation (28.83) with p = q has a double root equal to 
zero, show that for large samples the value of r corresponding to the canonical correlation 
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is given by omitting all terms in the determinant when. expanded, except those in 2? and 
4. Noting that the latter is a perfect square, show that r is the ratio of a polynomial 
in the sample dispersions to a non-vanishing function regular in the neighbourhood of 
zero. Hence that (28.107) holds when p = 0. 

(Hotelling, 19365.) 


28.8. In the notation of 28.23, if 


A= | oa, |; B= | a; | 
0E | [^ Cu 
C= |-------- b a D= | Ae seats been ies 
T ` oy Gin Tij 


KENES 


and the square of the vector alienation coefficient Z defined by 


D 
Bote: 


are invariant under linear transformations of the variate, Also that 


K=+pipr+++ Pp 
Z = (1 — pj) (1 — e)... (lL — pi) 


where the p's are canonical correlations. 
(Hotelling, 1936b.) 


28.9. In the notation of the previous exercise, k and z being the sample values of 
K and Z, show that if the population canonical correlations are all distinct, 


Wl ee Sf = i)? 
vark —-K MESE 


i-1 


24 7i yl 
var =- Z* D "pt 


i=l 


n 
cov (k, 2) = — KZ (ee), 
i=1 


In particular, when p = 2, 


var k= = {(1 — K)?—~Z(1+K)} 
varz = "(1 —Z+ K?) 


cov (k, 2) = — * KZ (14-Z — K?). 
(Hotelling, 19365.) 
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28.10. In the previous exercise, with p = q = 2, show that, in standard measure, 
Le Tis o4 — Tia T23 
TOS SE TC 75) ¥ 
and hence derive a test of significance of the “ tetrad difference " fis 44 — Tis Tes. 
(Hotelling, 1936b.) 


28.11. In the notation of Exercise 28.9, show that 


E (bz) ABC A» JEG ae Je | 


ee dec 


28.12. Find the characteristic function of — log z, where z is defined as in the 
previous exercise, and hence show that — n logz or, to a better approximation, 
—{n — 1 — à (p + q + 1)]logz tends to be distributed as y? with pg degrees of freedom 
when n is large. 


(Girshik, 1939.) 


(Bartlett, 1938c.) 


CHAPTER 29 
TIME-SERIES—(1) 


29.1. A time-series, as its name indicates, is a series of values assumed by a variable 
at different points of time. We shall consider only cases where the variable is univariate 
and shall denote its value at time t by u, The study of such series forms an important 
branch of statistics because the majority of types of time-variation encountered in practice 
are not of the regular functional type in which w, can be represented exactly by a mathe- 
matical function of t, but present in some degree those irregularities of a random character 
which can only be discussed in terms of probability. One of our main problems, in fact, 
will be to isolate systematic from casual effects in the series so as to be able to study 
them separately. 


29.2. In general it is possible to observe a time-variable at any instant, and thus 
the temporal intervals between successive members of the series need not be the same. 
Practice and theory alike, however, usually require the observations to occur at regular 
intervals, and in the sequel we shall assume, unless the contrary is specifically stated, that 
the interval from each observation to the next is the same throughout the series. As 
a matter of convenience we may take this interval as our time-unit and write the series as 


‘Ugg: gs Ug) adalat, ‘onions ace D 5 : < (29.1) 


where ¢ must be an integer. Where a series extends backwards and forwards from some 
given point which we wish to regard as origin we may write it as 


+ Way e o e Ua Wii, Mos U1, Mas o u e Mee ; + (29.2) 


In this chapter and the next we shall study the way in which w, varies with £, such variation 
being in general of the stochastic type, that is to say, involving random variables, 


Some Examples of Time-series 

29.3. Tables 29.1 to 29.5 provide some examples of the kind of variation encountered 
in practice. Table 29.1 (illustrated in Fig. 29.1) gives the annual yields per acre of barley 
in England and Wales from 1884 to 1939. Table 29.2 (Fig. 29.2) shows the human popula- 
tion of England and Wales at ten-yearly intervals from 1811 to 1931. Table 29.3 (Fig. 29.3) 
gives the sheep population of England and Wales for each year from 1867 to 1939. 
Table 29.4 (Fig. 29.4) gives the annual rainfall in London for each year from 1813 to 1912, 
Table 29.5 (Fig. 29.5) gives the average egg-production per laying hen in the U.S.A. for 
each month of the years 1938 to 1940, 
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Ex TABLE 29.1 


Annual Yields per Acre of Barley in England and Wales from 1884 to 1939. 


- "E 
z (Data from the Agricultural Statistics.) 
Yield per Yield per 
acre leta Boer: acre Gala). 
14:2 1926 16:0 
15»8 27 16-4 
15-7 28 17-2 
14-1 29 17.8 
14:8 30 14-4 
14-4 31 15:0 
15:6 32 16-0 
13-9 33 16:8 
14-7 34 16:9 
143 35 16-6 
14-0 36 16-2 
14-5 37 14:0 
154 38 18-1 
- 153 39 17:5 


3 


Yield (cwf. per acre). 


i, 


1880 1890 1900 1910 1920 1930 1940 


Years. 
Fic. 29.1.—Graph of the Data of Table 29.1 (Barley Yields per Acre). 
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TABLE 29.2 


Population of England and Wales at Ten-Yearly Intervals from 1811 to 1931. 
(Data from the Registrar-General’s Statistical Review, 1933, Part IL) 


^ 


Year Population 
(millions). 
| 
1811 10-16 
21 12-00 
31 13-90 
41 15-91 
51 17-93 
61 20-07 
71 22-71 
81 25.97 
91 29-00 
1901 32-53 
. 1l 36-07 
21 37-89 
31 39-95 


a 
Ss 


ig 
© 


ET 


by 
S 


Population (millions). 


~ 
© 


1811 1831 1851 1871 


1891 


Years. 


Fic. 29.2.—Graph of the Data of Table 29.2 (Population of England and Wales), 


1911 


1931 


- 
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TABLE 29.3 


Sheep Population of England and Wales for each Year from 1867 to 1939. 
(Data from the Agricultural Statistics.) 


Population Population Population 7. Population 
Year. (10,000). Year. (10,000). Year. (10,000). Year. (10,000). 
| | 
1867 2203 1886 1892 1905 1823 1924 1484 
68 2360 87 1919 06 1843 25 1597 
69 2254 88 1853 07 1880 26 1686 
70 2165 89 1868 08 1968 27 1707 
71 2024 90 1991 09 2029 28 
72 2078 91 2111 10 1996 29 
73 2214 + 92 2119 11 1933 30 
74 2292 93 1991 12 1805 31 
75 2207 94 1859 13 1713 32 
76 2119 95 1856 14 1726 33 
17 2119 96 1924 15 1752 34 
78 2137 97 1892 16 1795 35 
79 2132 98 1916 17 1717 36 
80 1955 99 1968 18 1648 37 
81 1785 1900 1928 19 1512 38. 
82 1747 01 1898 20 1338 39 
83 1818 02 1850 21 1383 
84 1909 03 1841 23 1344 
85 1958 04 1824 23 1384 
E 
24 


Sheep Population (millions). 


14 


12 


1865 1885 1905 1925 1945 
Years. 


Fic. 29.3.—Graph of the Data of Table 29.3 (Sheep Population), 
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TABLE 29.4 
Total Annual Rainfall at London in Inches, for each. Year from 1813 to 1912. 
(Data from D. Brunt, Phil. Trans. A, 225, 247, 1925.) 

Rainfall Rainfall 7. Rainfall 7. Rainfall 

Year. (inches). Year. (inches). Year. (inches). Year. (inches). 
1813 23-56 1838 21-63 1863 21:59 1888 27-74 
14 206-07 39 27-49 64 16-93 89 23-85 
15 21:86 40 | 19-43 65 29:48 90 21-23 
16 31-24 41 31-13 66 31-60 91 28-15 
17 23-65 42 23-09 67 20:25 92 22-61 
18 23-88 43 25-85 68 23-40 93 19-80 
19 26-41 44 22-65 69 25-42 94 27-94 
20 22-67 45 22-75 70 21-32 95 21-47 
21 31:69 46 26-36 7l 25-02 96 23-52 
22 23-86 47 17:70 72 33-86 97 22-86 
23 24-11 48 29-81 73 22-67 98 17-69 
24 32-43 49 22-93 74 18-82 99 22-54 
25 23-26 50 19-22 75 28:44 1900 23-28 
26 22:57 51 20-63 76 26:16 01 22-17 
27 23:00 52 35-34 71 28:17 02 20-84 
28 27-88 53 25:89 78 34-08 03 38:10 
29 25-32 54 18:65 79 33-82 04 20-65 
30 25:08 55 23-06 80 30-28 05 22-97 
31 27-76 56 22-21 81 27-92 06 24-26 
32 19:82 57 22-18 82 27-14 07 23-01 
33 24-78 58 18-77 83 24-40 08 23-67 
34 20-12 59 28-21 84 20°35 09 26-75 
35 24-34 60 32-24 85 26-64 10 25-36 
36 27-42 61 22-27 86 27-01 1l 24-79 
37 19:44 62 27-57 87 19-21 12 27:88 


Annual Rainfall (inches). 


5 
1860 1870 1880 1890 


Years. 
Fi. 29.4.—Graph of the Last 50 Terms of the Data of Table 29.4 (Rainfall). 


1900 1910 
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TABLE 29.5 


Average Number of Eggs per Laying Hen in the U.S.A. for each Month of the Years 1938-1940, 


from Report of the Bureau of Agricultural Economies, U.S. Dept. of Agriculture, on the 
s E Poultry and Egg Situation, March, 1941.) 


] fae 
Year. Jan. | Feb. | Mar. | Apr. | May. laa | July. | Aug. | Sept. | Oct. | Noy. | Dec. 
| | | 
| S eE —] | 
1938 T9 9-9 | 15-4 | 17-5 | 173 | 149 | 136 | 11:8 | 9:4 T5 | 5t 6:4 | 
1939 8-0 9-7 | 149 | 170 | 17-0 | 14:6 | 13-2 | 11-7 9:3 T4 5 68 | 
1940 T3 9-0 | 144 | 16:5 | 17-0 | 14-8 | 13-4 | 1:8 | 9-7 | T9 68 | 


N 
S 


a 


iSi 


Average Number of Eggs ber Hen. 
[s] 


Mar. June Sept. Dec. Mar. June Sept. Dec Mar. June Sept. Dec 
1938 1939 1940 
ate. 
Fic. 29.5.—Graph of the Data of Table 29.5 (Egg Production). 


These series are fairly typical of the kind of material with which our theory has to 
deal. The data of Table 29.1 (barley yields) present a very irregular fluctuation, and so 
far as the eye can see (which is not a decisive test) there is no systematie oscillation and no 
regular movement in mean yields over the period. By contrast, Table 29.2 (human popula- 
tion) shows a relatively smooth movement without apparent oscillation. Table 29.3 (sheep 
population) combines a general decline in numbers with marked oscillatory effects which, 
though not perfectly regular, appear to be systematic to some extent. Tables 29.4 and 
29.5 exhibit an oscillatory effect which is definitely seasonal for the lat 


ter.and much less 
regular for the former, neither indicating a variation, in the periods covered, of the average 
values about which the series oscillate, 


29.4. It must not be overlooked that our method of determining the values of the 
series at fixed equal intervals of time may suppress evidence of oscillatory movements 
which have a period equal to those intervals or to some sub-multiple of them. Suppose, 
for instance, that there was a systematic oscillation in the English population expressible 
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by a harmonie compcnent with period of exactly 10 years, or exactly 5 years, or exactly 
33 years. Clearly, by observing the series at 10-yearly intervals we should never find any 
evidence of this effect, for it would contribute exactly the same amount to each observation, 
without oscillation. In the population case, of course, we have collateral evidence to 
indicate that no such oscillation exists, but where nothing is known of the series otherwise 
we can never exclude the possibility of a period exactly equivalent to our time-interval. 
Sometimes, in fact, we know that it is there, and choose our interval so as to exclude the 
oscillation from consideration. For instance, in our sheep population we know that there 
is a seasonal effect within the year, which is not brought out in Table 29.2 because the 
sheep census is taken on June 4th each year; and again, in the rainfall data of Table 29.4 
we have taken as representing the year the whole rainfall within the year, knowing quite 
well that rainfall is seasonal to some extent, even in London. 


29.5. A general survey of these and similar series suggests that the typical time- 
series may be regarded as composed of three parts :— 

(a) a trend, or long-term movement ; 

(b) an oscillation about the trend of greater or less regularity ; 

(c) a “random”, “ irregular ” or “ unsystematic " component. 

It is customary to regard the series as composed of these elements superposed one on 
another; that is to say, we consider the movement of the series as the sum of three dif- 
ferent components which may be generated by different causal systems. Particular series, 
of course, need not exhibit them all. That of Table 29.2 (human population) seems 
to be almost entirely trend, with perhaps a small unsystematic residual, whereas that of 
Table 29.5 (egg production) appears to be entirely oscillatory, and very regularly so. 
But some series at least exhibit all three. 


29.6. The primary problem of time-series analysis from the statistical viewpoint is 
to isolate the three factors for individual study, and in this chapter and the next we shall 
be mainly concerned with various methods of carrying out the necessary analysis. Before 
proceeding, however, we must look a little more closely into the reality of the effects which 
we are investigating and the basis on which we assume that the analysis is legitimate. 


29.7. Perhaps the easiest component to understand and to remove from the series 
is the seasonal effect. This is a fluctuation imposed on the series by a cyclic phenomenon 
external to the main body of causal influences at work upon it. The oscillation in egg- 
production in Table 29.5, for instance, reflects the rhythm in the reproductive process 
which is found among birds in virtue, ultimately, of the fact that the earth goes round 
the sun once a year. Strictly speaking, we ought to confine the word “ seasonal " to those 
effects which are annual in period ; but where no confusion is likely to arise we can apply 
the same word and the same ideas to any phenomenon generated by strictly periodic natural 
processes, such as “ spring ” and “ neap ” variation in tides or daily variation in tempera- 
ture. We must, however, be careful about extending the notion of seasonality to phenomena 
which are not demonstrated beyond reasonable doubt to depend on strictly periodic stimul', 
it would be going too far, in the present state of our knowledge, to speak of 
sunspot variation as seasonal in this sense, and much too far to speak of seasonality in 
erop-yields as determined by sunspots, even if the relation between the two were estab- 
lished, We shall return to this point below when defining what we mean by a “ cycle” 


as distinct from an * oscillation ". 
A.S.—YVOL. II. 


For instance, 


BB 
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29.8. As we noted in 29.4, the seasonal effect may already be removed from the 
series by the way in which the data are specified. Where we ourselves have any choice 
in the determination of the data, we may eliminate seasonality in the same way, namely, 
by selecting for measurement of the series a point of time which is fixed in relation to the 
year, such as June 4th for the agricultural returns of England and Wales, or by averaging 
over the year, or (what is much the same thing) by cumulating the series over the year, 
as for instance with rainfall data. 


29.9. The concept of trend is more difficult to define. Generally, one thinks of it 
as a smooth broad motion of the system over a long term of years, but “ long " in this con- 
nection is a relative term, and what is long for one purpose may be short for another. For 
example, if we were examining rainfall records over a hundred years a slow rise from the 
beginning of the period to the end would be regarded as a trend ; but if we possessed records 
for two thousand years (and the rings in some of the giant redwood trees give an index of 
climatic conditions for periods of this order) the rise over a particular century might appear 
as part of a slow oscillatory movement, so that any inference from the “ trend ” in a par- 
ticular century to the effect that the weather was likely to continue becoming wetter and 
wetter might be quite false. What inference we should make in practice would depend 
on what we were trying to do. If we were engineers designing a water-supply system and 
wished to provide against droughts of reasonable extent, we might perhaps assume that the 
trend would last as long as our works and: proceed accordingly ; but if we were attempting 
to study climatic changes over the face of the earth for geological periods of time we should 
accept the continuance of the trend with the greatest reserve or, more probably, should 
reject it on collateral grounds. 


29.10. However long a series may be, we can never be certain, and often not even 
reasonably sure, that a trend in it is not part of a slow oscillation, except of course when 
the series has terminated (as might, for instance, be the case if we were considering the 
lengths of reigns of the Roman Emperors). In speaking of a trend, therefore, we must 
bear in mind the length of the series to which our statement refers. Perhaps it would be 
more accurate to speak of slow or quick movements rather than of trend and oscillation, 
but even so the distinction between the two would remain a matter of subjective judgment 
to some extent. 


29.11. When seasonal variation and trend have been removed from the data we 
are left with a series which will present, in general, fluctuations of a more or less regular 
kind. Fig. 29.1 represents the kind of series we obtain, since it has no components of 
trend or seasonality. The question then arises, is this residual series systematic in the 
sense that its values can be represented as a function of the time ? Or, on the other hand, 
are the values random in the sense that they could occur, in the observed order, by random 
sampling from a homogeneous population ? Or again, is there some possibility intermediate 
between complete functional variation and complete randomness? The search for syste- 
matic effects in residual fluctuation gives rise to several techniques of analysis, the object 
of which is to detect whether any part of the series is subject to law, and therefore predict- 
able, and whether any part is purely haphazard. The former part we shall call systematic, 
and it will be referred to as an “ oscillation ” (not a “ cycle ", which is a very special case 
of an oscillation, as we shall see later). The remainder of the series we shall call the unsys- 
tematic component, and refer to its movements as “random”. When a series is a mixture 


Y 
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of oscillation and random movement it will not cause any inconvenience to refer to the 
up-and-down movement generally as fluctuation before we have analysed it into its con- 
stituents ; that is to say, we may speak of fluctuation without prejudice to the possibility 
of detecting oscillatory movements in it. 
In this chapter we study trend and random residuals. In the next chapter we shall 
deal with oscillatory and cyclical components. 
/ 


29.12. The logician or the economist who wants to be difficult can always maintain 
that, although any series can be separated into our three specified components as a matter 
of mathematical or statistical analysis, the results throw little or no light on the causal 
influences at work to produce the series. To such a critic we have to concede, I think, 
that in carrying out the analysis we have at the back of our minds the strong possibility 
that the three elements are due to independent causal systems. If he refuses to accept 
this view—and some economists do—we can only invite him to produce a better statistical 
method. 

Possibly the reader will feel, on reaching the end of Chapter 30, that we have not been 
wasting our time, and that our methods do throw light on the way in which time-series 
behave. If not, he should consult some of the references and see whether he finds them 


statistically more satisfying. 


Determination of Trend 


29.13. It is an essential part of the concept of trend that the movement over fairly 
long periods is smooth. This means that we can represent the trend component, at least 
locally, by a polynomial in the time element ¢. Thus, given the series u, we may, in the 
first instance, seek for some polynomial 


m=O +ayt+at+...+a,P. D E . (29.3) 


which will give an account of the trend movement. By taking p great enough we can, of 
course, obtain as close a representation as we like to a finite series ; and how large we 
take p is a matter for decision in particular cases. . ; 

If the polynomial is fitted to the whole series by least squares, it evidently gives the 
curvilinear regression line of w, on the variable 7. This method would then lead to the 
fitting of regressions in the manner of Chapter 22, and we need not repeat here what has 
been said on the subject in that chapter. In Example 22.7 we did, in fact, fit a quartic 
to the population data of Table 29.2 and found a good fit. 


29.14. It is, however, clear that to obtain a satisfactory trend-curve for data such 
as that of Table 29.3 (sheep population), we should have to take a polynomial of rather 
high order. This may appear somewhat artificial and in any case the coefficients of such 
a polynomial, being based on high-order moments, would be very unstable from the sampling 
viewpoint. A more practical objection, though by no means an unimportant one, is that 
if we add another term to the series, as for example if we are keeping an annual series up 
to date from year to year, the work of fitting has to be done afresh each time. Moreover, 
the trend-line may be affected throughout its length. When, therefore, the series has no 


very obvious trend such as that of Table 29.2 it is more convenient to use the simpler 


-methods described below. 
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Moving Averages 


29.15. An alternative to finding a polynomial which will represent the whole series 
is to determine a polynomial which will represent a part of it, and to use different poly- 
nomials for different parts. The simplest method, and one which forms the basis of the 
majority of methods of trend fitting, is to take the first m terms (m being chosen at will), 
fit a polynomial of order p, not greater than m — 1, to them, and use that polynomial to 
determine the value in the middle of its range; then to repeat the operation with the m 
terms from the second to the (m + 1)th, and so on, moving on one term at each stage. 
Unless other considerations require it, we take m to be odd, so that the middle point of 
the range corresponds to a value which is actually observed. Otherwise the middle point 
falls half-way between two observed values, or we have to use some value of the fitted 
polynomial other than the middle point, which results in a loss of useful symmetry. 


29.16. Suppose, then, that the number of terms is chosen to be odd and is denoted, 
with a slight change of notation, by 2m + 1. Without loss of generality we may denote 
the terms by U-m» U—(m—1)> - + + Uo ++ + Um—1» Um: If we choose to fit to them a poly- 
nomia! of the pth order (29.3) we may, in the usual way, determine the coefficients by 
least squares, i.e. solve the equations 


m 


2 : 
Ao B a4 ris a PN ` 90 
o (u, — Go Ve a a, t? ) 0, Dass. . (29.4) 


=—m 
which will give us equations typified by 
Suu) a Z CY- a 2) — Fe. —aQ Z(g) = 0. . « (29.5) 


p 


Now the sums S(t’) are functions of m only. Thus, if we solve (29.5) for a, we shall find 
an equation of the form 


Qo = Co + Cy Wim + Ca mq + + + + + Cima V . . (29.6) 


where the c's depend on m and p, but not on the w’s. 

Now u, assumes the value a; at t = 0 and hence this value, as given by (29.6), is the 
value we require for the polynomial. As we see, this is equivalent to a weighted average 
of the observed values, the weights being independent of which part of the series is taken. 
Thus our process of fitting a trend-line consists of determining the constants c (which 
depend on m and p and therefore give us a twofold element of choice) and then calculating, 
for each consecutive set of (2m + 1) terms in the series, a value given by (29.6). If the 
terms are U, . . . Usm+z the calculated value will correspond to t = m -- z. There will 
be no values corresponding to the m terms at the beginning and the m terms at the end. 


Example 29.1 m 


Suppose we have a series and wish to fit a curve which best approximates to sets of 
seven points; and suppose we regard a cubic as providing a satisfactory approximation. 
What are the weights of the moving average ? 

We have m = 3 and p = 3, and our polynomial is 


A = d. +a, t +a, +a, t 
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Taking our origin at £ = 0, we find, for equations (29.5), in virtue of the fact that X (I) = 0 
for odd k, 


2 (u) = Tao + 28a, 

Z (tu) = 28a, + 196a, 
X (tu) = 28a, + 196a, 
22 (un) —s 196a, + 1588a; 


giving, for ay, 


- (— 2u.3 + 8u_2 + 6u_y + Tus + bu, + 3u, — 2u;). 


We may write this conveniently as 
1 
[mo Qu 
21l » 3, 6, 7, 6, 3, — 2] 
or, when symmetrical formulae are used, as in the present case, by 
[emen GS VI T e 


denoting the middle term by heavy type. 
To take a simple illustration. Suppose the series is given by the following values ;— 


LI 2 3 4 5 6 7 8 9 10 
1,: 0 1 8 27 64 125 216 343 512 729 


We have, for the trend value at t = 4, 


p= d ((—2x0)--(3 x 1)- (6x 8)-+(7 x 27) -+(6 x 64) + (3 X 125) - (2x 216) }= gro) 


21 
— 3. 
Similarly, at £ — 6 we find 
4-5 (=2x8)+(3 x27) +... —(2 x 512) } 
= 125. 


In both cases the trend-value is equal to the actual value of the series, and this obviously 
must be so when we note that we are fitting a cubic to the series 
u= (t — 1). 
Tt will be observed that in this example we should have obtained the same value for 


a, if we fitted quadraties instead of cubics; and generally the case p odd includes the 
case of the next lowest (even) value of p, so that we need not give separate formulae for 


even p. 
29.17. Writing a, [k] for the value of a, calculated in the above manner for an average 


of k successive terms, we find the following formulae up to p = 5. The reader may care 
to verify them for himself as an exercise. 
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Quadratic and Cubic 


te [5] al 12, 17, . ..] 
(7) gl JEU] 


1 
el Bests 3058640595) eae 
(Biers. | 3 ] 
1 


1] -l.p— a6, 9, 44, 69, 84, 89, . . . 
i] 29 ] 
1 90.7 
RES [TTE Ong nig ol 24925; - 1. 1 (9.7) 
[13 143 [ ] 
[5] —L.[— v8, — 13, 42, 87, 122, 147, 162, 167, ...] 
1 ios 
1 
17] — [—21, — 6, 7, 18, 27, 34, 39, 42, 43, . . . 
7] xs ] 
[9] -l.[— 136, — 51, 24, 89, 144, 189, 224, 249, 264, 269, . . .] 
3961 
[21] + [— 171, — 76, 9, 84, 149, 204, 249, 284, 309, 324, 329, . . .] 
l 3050 


Quartic and Quintic 


[7] aaj ENSE soe ee] 4 
DP. . gall — 55, 30, 135, 179, . ..] 

[11] 25 [18, — 45, — 10, 60, 120, 143, . . .] 

[13] zi [110, — 198, — 135, 110, 390, 600, 677, . . a) 


[15] Sis (2145, — 2860, — 2937, — 165, 3755, 7500, 10,125, 11,063, .. .] | (22:8) 
1 


1199 DE es] 
1 


[19] ggg 340. — 255, — 420, — 290, 18,405, 790, 1110, 1320, 1393, . . .] 
1 
21 28 P — 394 
[21] gp ors [11:628, — 6460, — 13,005, — 11,220, — 3940, 6378, 17,655, 
28,190, 36,660, 42,120, 44,003, . . .] 


17] [195, — 195, — 260, — 117, 135, 415, 660, 825, 883, 


29.18. Several methods have been proposed to simplify the arithmetic of fitting 
a trend-line by moving averages, the large numbers in some of the expressions in (29.7) 
and (29.8) involving considerable labour in straightforward application. The simplest, 
perhaps, is that of iterated averages. 


Suppose we take an average of sets of four with equal weights—a very simple process 
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—and then another average of the same kind of that average. If the primary series is 
t, the result of the first operation will be to give a series 


1 
vı = qn + ua + Us + th) 


v= : (us + Us + Us + Us), ete., 
and that of the second operation to give 
=F +% +3 + %) 
= i5 [Uy + 2u4 + 3u4 + 4u, + 3u; + 2u4 + w]. . . (29.9) 
We may write this symbolically as 
ae it, ny = gi 3 3 4 P ROME UU) 


or, reserving the symbol z lj for a simple arithmetic mean of k terms, as 


E peri eval Ak gosse EO) 


Now compare the weights of the average derived in Example 29.1 for fitting a cubic 


. to seven points. Reduced to unit divisors we have for the weights of the latter 


— 0:0952, 0-1429, 0-2857, 0°3333 . . 

and for the weights of (29.9) 
0:0625, 0-1250, 0-1875, 02500 . .. 

The two are not identical, but they follow the same sort of course and it, might be possible 
to regard the latter as an approximation to the former. (We shall derive better approxi- 
mations presently, but this will serve for purposes of illustration.) Now the iterated 
summation resulting in (29.9) is much easier to carry out than the single weighted averaging 
process of Example 29.1. Generally, if we can find averages with simple integral weights, 
preferably unity, which will, in conjunction, give approximations to the more complicated 
weights of a single average, it is usually easier to use the iteration process. 


29.19. In the notation of finite differences, write 


Zu, = Upp — Uu j . ; 3 5 . (28.12) 
Eu, = t = (1+ 4) u Soo ok ita B13) 
Óu, = Uyy — Uy h . 5 ; « (29.14) 
We have, for the second “ central" difference ó*u,, 
ó?u, = (tyi — uj) — (t — u1) 
—(E—2--E)u,. $ : : 5 . (29.15) 
Writing 
Hi exp (2:4) 3 e EORR SEEN 20:15] 
we find, symbolically, 
ô=H—2 +E! 
g- exp (2id) + exp DAT D —2 
= — 4sin? $. : . 5 " . (29.17) 


* 
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m m 


Then Q5 (u) — e Qu) 
1 i-—m t-—m 
- fı 4-32 Si (cos 2 Uo; 
ja 


since the terms in sin 2jġ vanish, 
EL e MMC HE eS: 


sin d 
Thus " 
los ntfs 
Snt ra E pu, + OD e ESL. 20. (09.19) 


This interesting formula gives the arithmetic average in terms of the middle term u, and ~ 5 
its central differences. i 

If now our series is approximately represented by a cubic, so that fourth differences 
vanish, we have 


i ki 
g Elus = to + 94 


and this equation will in any ‘case be true up to third differences. Similarly, for two iterated 
averages we have, to the same order, 


1 
gg, Pd Es us = u + x (B +1 Jóm .  . —. (20,91) 


lj. eT 


and soon. We will use these results to derive two formulae in very general use by actuaries 
for “ graduating " a series, a process which is very similar to that of fitting a trend-line. 


Example 29.2. Spencer's 15-point Formula 
Consider three successive averages with equal weights 


1 1 
go [41 [616] — *. + 5 (4 —1 p 42-1 4-5? — 1) ó?u, 


=U, + i Ô? uo. 
We then have, to third differences 


ts = gs DIE — 50) mo 


Substituting for ô? the formula [1, — 2, 1], as given by (29.15), we find 
1 


Ma = gag I [5] [— 9, 22, — 9). 


Now without affecting the order of the approximation we may add factors in à or higher 
central differences, and can simplify the numerical coefficients to some extent. Let us 
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add to the factor [— 9, 22, — 9] a term — 354 3, 12 

: , 22, — 364 — [— 3, 12, — 18, 12, — 3]. 

is [— 3, 3, 4, 3, — 3], giving M n d 
1 


Uy = 350 el [5][— 3, 3, 4, . . .]. 


: This is Spencer's 15-point formula. Tt covers sets of 15 consecutive terms, the weights 
in full being 
- 3 6 
3z9L- 9 — 9 — 5, 3, 21, 46, 07, 74, .. .] 


Example 29.3. Spencer's 21-point Formula 
In a similar way we find 
1 P ; 
TTE [5]? [7] = 1 + 463, 
giving, to third differences, 
i 1 
p= Sen 2 — 482 
Uo 175 [5]? [7] (1 — 46?) uo 
1 


= GR [5]? [7] [— 4 9, — 4]. 


We now add to the factor [— 4, 9, — 4] the expression 
304 45° [9719 18. 9 STE MEE a RED PET 
giving 


l [e 
Wo = 175 [5]? [7] [— $ 0, $ l, $ 0, — 1l 


1 
— 350 [B?[7][- 1, 0, 1, 2, . . .] 
This is Spencer's 21-point formula. 


29.20. A few practical points arising in the application of the foregoing formulae 
are worth mentioning. 

(a) The order in which the iterations are carried out is of course immaterial, as the 
reader can easily verify. It is therefore more convenient, as a rule, to carry out the more 
complicated operations first, while the numbers being handled remain small. For instance, 
in applying the Spencer 15-point formula we should carry out the moving average 
[— 3, 3, 4, 3, — 3] first, then apply the simple average 2 [5], and then the two averages 
of four. This does not apply if the series is short, inasmuch as there are fewer of the final 
than of the initial operations. 

(b) The use of a moving average of extent 2/ + 1 involves the absence of £ terms at 
the end and k terms at the beginning of the trend-series. If the original series is short the 
loss may be serious, and this effect sometimes restricts considerably the extent of the 
average which we are able to apply. 

(c) It is possible to remedy the deficiency at the ends of the series by special formulae, 
but the values so derived have less reliability than those of the main trend-line, and on 
the whole it seems better to accept the loss of 2k terms unless trend-values for the beginning 


and end of the series are really essential. 
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(d) As yet we have given no guide as to the choice of most suitable values of m and p. 
In practice we do not usually require to fit curves of degree higher than five, and often 
a cubic is sufficient, as is assumed in the Spencer formulae. There is greater elasticity in 
the choice of m, but the point mentioned in (b) above requires m to be as small as possible, 
consistent with other requirements. We shall see later in the chapter that the variate- 
difference method gives some further guide as to p, and that certain effects of trend-elimina- 
tion on random elements bear on the extent determined by m. 

(c) There is a voluminous literature on trend-fitting which appears to me out of pro- 
portion to the importance of the subject. It is not difficult to pursue inquiries on the 
above lines to the point of extreme apparent precision and great mathematical complexity, 
and perhaps such work is valuable where the series is fairly smooth and not disturbed 
seriously by sampling variation or superposed random fluctuation. But many of the 
series encountered in statistical practice will not bear the weight of great refinement in 
trend-fitting. The student will probably find that a knowledge of fitting by moving 

- averages will be sufficient for all ordinary and many extra-ordinary purposes. 


The Effect of Trend-elimination on Other Components 


29.21. In Table 29.6 we have applied the Spencer 21-point formula to an artificial 
series obtained by adding a random element to a cubic. Specifically, 


1 


1 
i — 26 26)? - (t — 26)8 Š ; 29.22 
u, = ( ) tds ) + 105! 6)? + & . (29.22) 
The component e, was taken from tables of random numbers and consists of samples from 
a population in which all integral values from 0 to 99 are equally frequent. The various 
columns of the table illustrate the process of fitting, and we may note in passing that for 


a series as short as this it is convenient to leave the more difficult summations to the last 
_ as there are substantially fewer of them. 

Now we know that the Spencer formula will fit a enbie exactly, so that when we sub- 
tract the trend from the original series we ought to eliminate the systematic constituent 
entirely and be left with our random component, except in so far as we have rounded off the 
systematic element to the nearest unit. A comparison of columns (2) and (9) in Table 29.6, 
remembering that the latter includes an element 49-5 equal to the mean of the random 
component, shows that we do not do so. The reason is not far to seek. The moving 
average has acted on the random element itself and determined a trend-line in it. 

The results of applying the Spencer 21-point formula to the random element e, are 
shown in column (11). We should expect that if the method were perfect the values in 
this column would be 49-5, the mean of e, apart from irregular sampling effects; but 
not only do the observed values deviate from this mean, they do so systematically, the 
values having a small oscillatory movement which is shown as part of the trend. 


‘ 29.22. This effect can assume considerable importance, particularly if we are elimina- 
ting trend so as to concentrate attention on oscillations. We proceed to examine it more 
closely. 

: Suppose that we have a series composed of the sum of three parts, a trend 4, (t), an 
oscillatory term 4, (t), and a random element 4; (f), so that - 


u= pit peths (v E : : . (29.23) 
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TABLE 29.6 
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Series given by Equation (29.22) with Trend-Line determined by a Spencer 21-point Formula. - 


j j 
Q)| (3) (3) (4) | (5) (6) | (7) (8) (9) (10) (11) 

Cubic | | | [— 1, 0, 1, Deviation | Graduation 
t Term. Et wu [5] tu | [5] (5). | E71 (6) 3 EB ako (8. | tu — (9). | of e alone. 

: | 
I | | w^ in 

JUST T8: 5:99.15 96 so eund | 1 M , 
EIE rad = tbe W090) |e e | : ie ; Led 
3|—92| 75 —17 | —246 ys. | > ane s a 
4.| — 80| 48 | —32 | —209 | ... | | Sn : ae 
5 | — 70| 59 —1l | — 87 | —572 | se s Don 
6 | — 60 1 —59 | — 42| —241 | ... eis , 5 
E 32 12:2 382 4^ 4l 2p! A dr 
Buge-c44 |- 72 28 85| 413] 2,233| " i NS 
9 | — 37| 59 22 194 | 670 | 3,801) v p s 
POR C— 51) 95 62 164 | 844] 5,120 AT me „S m 
E a 26.| 78 50 | 215| 957| 5,984| 14,352 41 9 67 
Tou cod n4 2 | 186| 996) 0,042| 15,470 44 —42 66 
18 | —18| 97 79 198 | 1,078 | 7,041| 15,815 45 34 63 
14 | — 16 8 — 7 | 233| 1020 | 7,145) 15,076 45 —52 60 
Lele) pee TT) 74 | 246 | 1,071 | 7,038| 14,978 43 31 55 
16 |— 10| 95 85 163 | 1,069 6,934 14,106 40 45 51 
DIVI 8 95 15 231 | 948 | 6,709| 13,379 38 —23 47 
Ta zd 3 — 4 196 | 850| 6,535) 12,703 36 —40 43 
19 | —-8) 87 61 112 | 892| 6,408) 12,169 35 26 40 
20 | — 5| 44 39 148| 853 | 6,363) 12,102 35 4 39 
AER 5 1 205 |: 852| 6,440 | 12,279 35 —34 39 
2921.92 gets 51 192| 944| 6,611) 12,676 36 15 39 
PEERS | ets 53 195 | 1,024 | 6,769) 13,228 38 15 40 
2409550 48 204 | 1,031,| 7,052| 13,857 40 8 4l 
PARA aa A 42 228 | 1,015 | 7,353| 14,508 41 1 42 
26 0| 10 10 212 | 1,000 | 7,010| 15,120 43 —33 43 
27 1| 74 75 176 | 1,136 | 7,923| 15,634 45 30 44 
28 2| 35 37 230 | 1,153 | 8,249| 16,251 46 = 9 44 
29 4 8 12 290 | 1,201 | 8,607/ 17,002 49 E 45 
30 6| 90 96 245 | 1,937 | 9,019| 17,717 51 45 44 
31 9| 6l 70 260 | 1,357 | 9,424| 18,499 53 17 44 
32 19.| 248 30 312 | 1373 | 9,870| 19,307 55 -—25 43 
33 15| 37 52 | 250| 1,462 |10,429| 20,159 58 —6 42 
34 20| 44 64 | 306 | 1,541 | 10,989| 21,133 60 4 4l 
35 24 |. 10 34 334 1,509 | 11,679 | 22,417 64 —30, 39 
36 30| 96 126 339 | 1,760 | 12,539 | 23,797 68 58 38 
37 36 | 22 58 370 | 1,897 | 13,529 | 25,737 74 —16 37 
38 44 | 13 57 411 | 2,047 | 14,699 | 27,955 80 ED 36 
39 52| 43 95 443 | 2,233 | 16,060! 30,456 87 8 35 
40 61| 14 75 484 | 2,152 | 17,570| 33,934 95 —20 34 
41 71: 87 158 525 | 2,711 | 19,353) 36,716 105 53 34 
42 83 | 16 99 589 | 2,960 | 21,394] se. AS D MS 
43 95 3 98 670 | 3,270 | 23,690 aoe v x: Tui 
44 109 50 159 | 692 | 3,680 | 26,255 A se ess 
45 124| 32 156 | 794| 4088 | ... E i m E 
46 140 | 40 180 935 | 4,529 oe s p 
47 158 | 43 201 997 | 5,017 | ... 5 
48 177| 62 239 | 1,111] ... A. , ; 
49 198 | 23 221 | 1,180 ue o vs 
50 220 | 60 «|. 270 | ... | ‘3 be EN 4 
5l 244 5 249 gi : 5 5 Ed 
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If we determine the trend by a moving average, denoted by an operation 7’, then clearly 

Tu, — T$, + Td. + Ts. 3 1 s . (29.24) 

Let us now suppose that our method of determining trend is perfect in the sense that 
T$, = d, Then, on subtracting (29.24) from (29.23) to eliminate trend, we find 

u, — Tu, = (fa — Tha) + (ds — T4). : : . (29.25) 

The point of present interest is that the terms T, and T'j, in (29.25) may distort 

` the genuinely oscillatory parts of the residual series and induce spurious oscillatory move- 

ments. 


29.23. Consider the simple ease when ¢, is a sine term, sin (« + At), t being integral. 
Since 
k S 
pes (a + at) = EN sin {a +3(k+1)4},.  .  . (29.26) 
a simple moving average of k consecutive terms will result in a sine series of the same 
period and phase as the original, but with the amplitude reduced by the factor 


pou 4 A : . (29.27) 


Iteration q times will reduce the amplitude by the gth power of this factor, 

Thus the term 7’, will be small if k is large, q is large, or if 3&4 is a multiple of z, 
that is, if the extent of the moving average is a period of the oscillation. But if 4 is small 
and kA is small the amplitude is reduced very little and 4, — T4, will largely disappear, 
le. the moving average will partially obliterate the term in $,. In this case, kA being 
small, the extent of the moving average is small compared with the period of the harmonie 
term, that is to say the oscillation is a slow one. "This result is what we should expect. 
A slow oscillation is treated as a trend by the moving average and eliminated accordingly. 
Generally, the moving average will emphasise the shorter oscillations at the expense of the 
longerones. Furthermore, if the extent of the average is slightly greater thàn the period, 
the term (29.27) may have a negative sign, and consequently the difference from the trend 
may somewhat exaggerate the true oscillations. 

It is not so easy to exhibit the precise effect of the moving average when the weights 
are unequal and the terms are not harmonie, but evidently the same kind of situation is 
apt to arise, et 


29.24. Now consider the effect of a simple moving average (that is, one with equal 
weights) on the residual element ¢, which we will suppose to be a random element e, with 
variance v. For the term T, we have 

pue 
VIP am D $ : = : . (29.28) 

E =A 
where [35] is the greatest integer which does not exceed 4k. Consecutive values of e, are 
independent, but consecutive values of Ts are not; for 74; (a) and T4,(b) have 
k — (a — b) values of e in common and are correlated if a—b<k. Thus the series T's 
will be much smoother than $s, and if we proceed to further averagings will become smoother 
still. We have had an example of this effect in Table 29.6, and shall meet further 

examples below. 
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29.25. The effect of taking a moving average of a random series will then be to 
generate an oscillatory series, provided that the weights are such as to give a positive 
correlation between successive members of the generated series, a condition which is always 
realised in moving averages employed for trend-fitting. We shall call this the Slutzky- 
Yule effect, after the two statisticians who (independently) studied it in detail. 

The generated series is not regular in the cyclical sense, that is to say its peaks and 
troughs do not recur at equal intervals of time, and the amplitudes of the oscillations vary 
considerably. Nevertheless such oscillations present a striking resemblance to the kind 
of movement which is found in practice, particularly in economie time-series, and we shall 
consider them in more detail in Chapter 30. For our present purposes we require to con- 
sider how far the process of trend-elimination itself may generate such effects in order 
to be sure that oscillatory movements in a trend-free series have not been put there, so 
to speak, by our own arithmetical processes. 


29.26. For this purpose we shall consider the period and variance of a series gen- 
erated by the Slutzky-Yule effect. 

Since the peaks and troughs do not recur at equal intervals there is no quantity which 
we can conveniently call the length of the oscillation. There will, in fact, be a distribution 
oflengths. We may define as the mean length either the mean period from peak to peak, 
or that from trough to trough ; but this raises some difficulties as to whether we are pre- 
pared to admit as periods small ripples on the main undulation. 

Recognising its somewhat arbitrary character, we shall take as oür measure of oscilla- 
tory length the mean distance between “ upcerosses ", that is to say the mean distance 
between points where the series changes sign from negative to positive or “ crosses the 
x-axis”. Suppose the series is generated by a moving average with weights a, . . . ay, 
of a random variable which is normally distributed with variance v. Then the probability 


that 


^ 
Du opc care qu O 
ja 
k 
and Ups = Da, Bf Of cu lle caper een 
j=l 


ie. that the generated series changes sign from negative to positive, is the proportional 
frequency of 


kl 
1 1 2 DE 
dF = jen exp i= 25 y a] dey... dea : . (29.31) 
J=1 
k 


k 
between the hyperplanes Da £j; = 0 and p 4,£,,, — 0. This is equal to the angle 
j-1 


j=1 
between these two planes, which is given by 


cos 0 = Et. > e (29.82) 


Hence the mean distance between upcrosses is 27/0, where 0 is given by (29.32). 
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29.27. In a similar way, the probability that 


Uppy — U <0 . . . . . 
uy — y, > 0, 


that is that u, is a peak of the series, is the angle between the two hyperplanes 


k k 
atx ga e 0. (29.80) 
j-l 


j=1 


k k 
: 2,4 & — asa =0. : $ : . (29.36) 
j-i j-i 
and is given by 
(Qa — Q1) a, + (as — aa) (ag — a) +... ( ) 
+ (ay = 4,1) (Mp1 — G2) = Ak (ak — Ay cM 
cos 0, = ASHE = :20555(29.37) 
P (d (m -—a-...-a) ( 
Thus the mean distance between peaks is 2z/0,. The same formula obviously applies to 
mean distance between troughs. 


29.28. If we wish to exclude “ripples” of a certain length d from consideration \ 
we may inquire for the probability that (29.35) and (29.36) are satisfied in conjunction with 


à Ue > Ukya — « . 4 ; ; . (29.38) 
This is evidently the area cut off on the unit sphere by the three planes (29.35), (29.36) and 


k k 
X Qj & — > Qj &j,q = 0. . . . . (29.39) 
j=l j= 


If the angles between the planes are A, B and C this area is A + B +0 — 27 = 0, say. 
The mean length between peaks, ripples excepted, is then 47/0. 


Example 29.4 


In Table 29.7 we show 480 terms of a series of random numbers which can take integral 
values from 0 to 19, together with a moving sum of fives of a moving sum of threes. 
Fig. 29.6 shows a portion of the derived series graphically. There are 474 terms of the 
smoothed series. 

The mean value of our series is 15 x 9-5 = 142-5. The number of uperosses will be 
found from the table to be 23, the first between the 19th and 20th term of the smoothed 
series, the last between the 459th and the 460th. The mean distance between uperosses 
is then 440/22 — 20 units. How does this compare with the mean-distance given by 
“ normal" theory ? 

The weights of the graduation are [1, 2, 3, 3, 3, 2, 1] and from (29.32) we have 


cog = X 2)+ (2 x3) +... (2x1) 
131-p23-p ... +R 


34 
= = 0: 
37 9189 
0.— 23° M. 
Á 360 : 
Hence the mean distance = "^  — 15-5 units. 


23-233 


TABLE 29.7 
Series of 480 Terms of a Rectangular Random Series e and a [5] [2] smoothing S. 


5 | | 
He 10 ON ele ere te cal spe tale Lr i Ste IEEE D PRIORES SERIE | emai 


1| 3 49| 2| 61 241 113 [423 | 14 | 172 
4 E 2 * g 242 166 | 434 | 14 | 155 
Aen Bs 243 170|435| 9 |133 
4 5| 91 244 179} 436| 8 |107 
5/19/147]53/10) 92 245 179] 437| 3| 75 
6| 1/143}54| 3/101 246 184|438| 1| 53 
7| 3|145|55| 2| 119 247 190|439| 3| 55 
8/12/1605 |56 |11 | 141 948 194|440| 1| 72 
9|19|175 f 57| 14| 166 249 201|441| 5| or 
250 199] 442)16] 96 
251 193] 443) 8| 91 
952 178 |444| 2| 78 
253 178|445| 0| 75 

2 
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Fic. 29.6.—Graph of the Last 117 Terms of the Series S of Table 29.7. 
x " > 383 
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The observed mean distance is 20-0 units, but this is based on rectangular variation, and 
we are, perhaps, entitled to expect some difference from normal theory. For rectangular 
random variables, values distant from the mean occur more frequently, and it is not sur- 
prising to find oscillations in the series which do not result in upcrosses. 

The number of peaks in the series will be found to be 62, the first at the seventh term, 


: . 459 y 
the last at the 466th. Hence the mean distance between peaks is a = 7:5 units, From 


formula (29.37) we find 
cos 0, = $ 0, = 48? 11’. 


Thus the theoretical mean distance is = 7-5 units, in good agreement with experi- 


360 
48-187 
ment. It will be observed that several of the distances between peaks are due to very 
small ripples. 

From a number of experiments Dodd (19394) concluded that series generated from 


rectangular material conformed fairly well to normal theory. 


29.29. Let us now examine how the variance of the induced oscillation compares 
with the variance of the original random series. 

The sum of k random elements with variance v has variance kv and its mean has 
variance v/k. It does not follow that a simple moving average has a variance 1// times 
that of the random element, because of correlations between successive members in the 


derived series. If the original series was £, . . . ¢, the derived series is, with weights 
30:50 01 


Q4 £i + ds 8s +... kek = M, Say 

a E Gy € 2s F dg pyr = f] > 
1 £2 + G2 £s F + Tk Exe = Ma 3 3 . (29.40) 

Oi En-k+1 +42 Enk H +++ TOE, = Mn—K+1 


The expected value of the sum of these values is zero since the expected value of c may be 
taken to be so. Since there are n — k + 1 terms we have for the variance 


l 2 9 
IRE Em. ; " , . . (29.41) 
The expected value of this, since the «’s are independent, is 
H 
OR CES {2 (ny?) } =E (ny?) = (GQ? +a34+ ... af). . . (29.42) 


In particular, if the a’s are all equal to 1/k, the expected value of the variance is v/k. This 
gives us the average reduction in the variance. 

If a simple average of extent k is iterated q times the weights are the successive 
coefficients in 


1 
ju Hery... baktly. 


The sum of squares of these coefficients is the coefficient of 2? *-! in 


(1 — ak)? 


1 : 
= (1 ae ep ae te 
pl tete + ER 


. (29.43) 
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H B J 
and this gives the average reduced variance for a simple average of k iterated q times. 
The following are the values of the reducing factor for some of the values of k and q:— 


q 
1 2 3 4 5 
3 0-33 0:23 0-19 0-17 0-15 
4 0-25 0-17 0-14 0-12 0-11 
k 5 0-20 0-14 0-11 0:10 0:09 
6 0-17 0-11 0-09 0:08 0:07 
yi 0-14 0:10 0:08 0-07 0-06 


Evidently the result of the first moving average is to generate a series with a much 
lower variance than that of the original random element, but the second and succeeding 
iterations do not reduce the variance further to the same extent. In the case b = 7 the 
first averaging reduces the variance to one-seventh, but the next three reduce it only by 
a further half. 


29.30. To apply such results in practice we require an estimate of the variance of 
the random element in the original series. If this is available we can estimate the variance 
of the generated series and also, from 29.26, the mean distance between uperosses or 
between peaks. If then our residual series, after the elimination of trend, showed an oscilla- 
tory movement with this variance and these mean-distances, within sampling limits, we 
could not conclude that the oscillatory effect was real. It could have been induced by 
our method of eliminating trend. 

In the present state of knowledge it is not possible to assign permissible limits of 
sampling variation by relation to standard errors in the usual way. Whether any particular 
effect is significantly different from the values of the series generated from the random 
element remains, therefore, a matter of subjective judgment to some extent. The sampling 
problems involved are formidable, but there does not seem any reason why they should 
not be capable of explicit solution. This field of study awaits the attention of the theorist. 


Example 29.5 

For the data of Table 29.3 (sheep population of England and Wales) trend was elimi- 
nated by a simple average of nines, the resulting residuals being shown in Table 29.8. 
A glance at the series suggests some sort of oscillatory effect, since the signs of terms cluster 
together. By the methods of the next chapter the effect may be brought into greater 
prominence. The data themselves, however, indicate a mean-distance between upcrosses 
of about 8 or 9 years, and actual calculation gives a variance of 8474. Can this be due 
to the operation of our trend-elimination on a random element in the original series ? 

For the mean distance between uperosses due to a simple nine-point average we have 


cos 0 = > 0 = 27° 16’, 


and the mean distance is a = 13-2 approximately. This is considerably in excess of 


our observed value, but not sufficiently so to reject outright the possibility we are examining. 
Since, however, the variance of residuals is 8474 this must, to have been generated 


from a random series by a simple average of nines, derive from a random element with 
A.S.—VOL. II. cc 


v 


ET 
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TABLE 29.8 


Residual Values of the Sheep Series of Table 29.3 after Elimination of Trend by a Simple 
s Nine-Point Moving Average. 


1 
Residual m Residual w | Residual 
poss (10,000). SEG (10,000). Year. | (10,000). 
1871 — 176 1893 | + 34 1915 + 19 
72 — 112 94 | — 103 16 | + 128 
73 + 50 95 — 104 Te a - E07 
74 +141 96 — 15 187] + 69 
75 + 60 97 — 23 19 — 29 
76 — 20 98 + 17 20 — 174 
77 + 12 99 + 71 21 — 107 
78 + 82 1900 + 35 22 — 142 
79 + 130 01 + 16 23 — 109 
80 - l4 02 — 27 24 — 233 
81 — 166 03 — 32 25 + 60 
82 — 179 04 — 49 26 + 121 
83 — 84 05 — el 27 + 94 
84 + 38 06 — 52 28 — 25 
85 + 97 07 — 24 29 — 90 
86 + 8 08 + 68 30 — 75 
87 —- 5 09 + 141 31 | + 72 
88 — 105 10 + 119 32 + 152 
89 — 99 l1 + 66 33 +112 
90 + 35 12 — 52 34 — 64 
91 + 159 13 — 117 35 — 87 
92 + 167 14 — 61 


variance 76,266, An estimate of the variance of the random element in the original series, 
obtained by the variate-difference method which we describe below, was only 350 approxi- 
mately. Making every allowance for sampling effects, we cannot do otherwise than reject 
decisively the possibility that the residual oscillation is spurious in the sense of having 
been induced into the data by the effect of the elimination of trend on a random element. 


29.31. We may summarise the foregoing discussion of trend-elimination as follows :— 

(a) The conception of a trend as a “ smooth” or “ regular " movement is equivalent 
to the supposition that trend can be represented, at least locally, by a smooth mathematical 
function and in particular by a polynomial in the time-variable. 

(b) Certain series can be treated on lines formally equivalent to regression analysis ; 
but a more generally applicable procedure is to represent the trend by a moving 
parabolic are. 

(c) The moving are of best fit in the least-squares sense gives values which are deriv- 
able from a moving average of the data. The weights of this average are to some extent — 
at choice, according to the extent of the'average and the closeness of fit required in the 
moving arc. 

! (d) A moving average of extent k sacrifices (k — 1) terms, in the sense that the derived 
series is (b — 1) terms shorter than the original series. If the series is short it is usually 
desirable to keep this loss to a minimum, that is, to keep the extent of the average as 
short as possible. 3: 
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(e) A moving average may distort genuine oscillatory effects, in general exaggerating 
the shorter variations at the expense of the longer ones, and may induce spurious oscillatory 
phenomena by its action on random residuals, For harmonic components the effect is 
minimised by taking the average as simple, with extent equal to the period of the com- 
ponent. For random components the effect is minimised by making the sum of squares 
of weights in the average a minimum, i.e. by using a simple average. 


29.32. In the theory of time-series there are very few rules which can be laid down 
without a good deal of proviso and caveat. It will be evident from the foregoing that there 
is no golden rule in trend-fitting which can be applied irrespective of individual circum- 
stances. If we desire to get a close fit to the data we must use a parabola of fairly high 
order, which involves a moving average with weights which are für from equal. This, 
however, increases the danger of obscuring the true oscillations in the residuals. In 
most practical cases it is necessary to strike a balance between conflicting requirements 
by intuitive judgment as to the appropriate moving average to use. ; 


The Variate-difference Method 


29.33. We now proceed to consider the random constituent of a time-series, From - 


the very nature of random variation we cannot expect to derive any formula, however 
approximate, which will measure the random component directly at any given point of 
the series. The best we can hope to do is to determine the non-random components and 
to obtain a random residual which is left unaccounted for by those components ; and even 
this, as we shall see in the next chapter, is not a very strong hope when oscillations appear 
in the series. 

On certain assumptions, however, we may determine the variance of the random 
component and hence obtain a general idea of its magnitude and importance. Suppose 
that the systematic part of the series can be represented, at least locally, by a polynomial. 
Then successive differencing of the series will gradually eliminate the polynomial element 
but will not reduce the random element correspondingly, As we proceed with the differ- 
encing, the random element becomes more and more predominant until finally the syste- 
matic component is negligible. Hence we can determine effectively the variance of the 
random component in the differenced series, and by a simple calculation derive an estimate 


of that in the original series. 
29.34. Consider the differencing of a random series e, We have 


A & = &41 — & . (29.44) 


AP Et = Ettr — (1) regi + (e) Spat... +(— Tle. . (29.45) 


Without loss of generality we may suppose that the mean value of e, is zero, and thus 
ACAD AIRE FORT Tm MESE E oe CETT 


Hence 


var (A" &) = E (4r e)* 2 
=E fen p) Se EE a ya) 
= Blt e (1) des XT +a} 

2 


I 

= 
—— 

+ 
EE 
d 

+ 

+ 
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The sum in curly brackets is easily evaluated from the consideration that it is the coefficient 


2 
of a* in (1 + zy (x + 1)", that is, equals ( r ‘Hence 


vereri . 4v iu e947) 


We may then derive an estimate of v by writing 


y = Ha (Ae) 


S 1 E : 
r 

It is to be noticed that we use the second moment about zero, not the observed variance 

of A’ &, since the mean is known to be zero. This shortens the arithmetic to some extent. 


92 
The factor A for r = 1 to 10 has the following values :— 


Gi (T) 


. (29.48) 


F 
l 2 0:5 
2 6 0-166,667 
3 20 0-05 
4 70 0-014,285,7 
5 252 0-073,968,25 ' 
6 924 0-071,082,25 
x! 3,432 0-03,291,375 
8 12,870 0:0177,700,1 
9 48,620 0-0420,567,7 
10 184,756 0-055,412,54 


29.35. Basing itself on equation (29.48) the method of variate-differences proceeds 
as follows : We difference the series once, find the second moment about zero of the result- 
ant and divide by 2; we then difference again and find the second moment about zero, 
dividing in this case by 6; and so on. If the successive estimates of v decrease, we con- 
tinue with the differencing. There will, in general, come a point when they cease decreasing 
and remain.constant within sampling limits (which may be rather wide). At this stage 
we may suppose that we have eliminated the systematic element in the original series. 
The final estimate gives us an estimate of the variance of the random element in the original 
series, and the order of the difference to which we have had to go will give an indication 
of the degree of the polynomial representing the systematic component. 


Example 29.6 7 


Let us apply the variate-difference technique to the series of Table 29.6. We know 
from the method of constructing the series that the systematic part ought to be completely 
eliminated after the third differencing, and also that the random part consists of an element 
with variance 833 approximately. In fact, the random numbers from 1 to N have a 


variance (N? — 1)/12 and N in this case is 100. The actual variance of the random element 
in Table 29.6 is 843. 


THE VARIATE-DIFFERENCE METHOD 389 
TABLE 29.9 
Differences of the Series u, of Table 29.6. 
t u, At, A’. 21 At A5, ED 
1 —96 — 6 67 155 279 508 1050 
2 —90 —73 — 88 —124 —229 — 542 —1297 
3 -17 15 36 105 313 755 1524 
4 —32 —21 — 69 —208 —442 — 769 —1141 
5 —1 48 139 234 327 372 271 
6 —59 —91 — 95 — 93 -— 45 101 361 
7 32 4 —- 2 — 48 —146 — 200 — 229 
8 28 6 46 98 114 -— 3l — 625 
9 22 —40 — 52 — 16 145 594 1661 
10 62 12 36 161 449 1067 2252 
11 50 48 125 288 618 1185 1978 
12 2 —77 —163 —330 —567 — 793 — 876 
13 79 86 167 237 226 83 — 159 
14 -7 —8l — 70 1l 143 242 137 
15 74 -ll — 81 —132 — 99 105 551 
16 85 70 51 — 33 —204 — 446 — 655 
17 15 19 84 171 242 209 — 64 
18 — 4 —65 — 87 =l 33 273 690 
19 6l 22 — 16 —104 —240 — 417 — 629 
20 39 38 88 136 177 212 216 
21 1 —50 — 48 — 41 — 35 - 4 176 
22 51 — 2 T, — 6 = 3I — 179 — 650 
23 53 5 — 1 25 148 471 1110 
24 48 6 — 26 —123 —323 — 639 — 975 
25 42 32 97 200 316 336 41 
26 10 —65 —103 —116 — 20 295 925 
27 75 38 13 — 96 —315 — 6030 — 965 
28 37 25 109 219 315 335 207 
29 12 —84 —110 — 96 — 20 128 316 
30 96 26 = 14 round —148 — 188 — 32 
31 70 40 62 72 40 — 156 — 798 
32 30 —22 — 10 32 196 642 1597 
33 52 —12 — 42 —164 —446 — 955 —1719 
34 64 30 122 282 509 764 950 
35 34 —92 —160 —227 —255 — 186 141 
36 126 68 67 28 — 69 — 327 — 991 
37 58 1 39 97 258 664 1515 
38 57 —38 — 58 —161 —400 - 851 —1492 
39 95 20 103 245 445 641 707 
40 75 —83 —142 —200 —196 — 66 281 
41 158 59 58 — 4 —130 — 347 — 685 
42 99 1 62 126 217 338 509 
43 98 —61 — 64 — 91 —121 — 171 — 314 
44 159 3 27 30 50 143 432 
45 156 —24 - 3 — 20 — 93 — 289 — 745 
46 180 —21 17 73 196 456 v 
47 201 —38 — 56 —123 —260 an 5 
48 239 18 67 137 e. sae T 
49 221 —49 = 70 tas pon DR me 
50 270 21 oe vee we . mm 
51 249 ng m . aes 
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Table 29.9 shows the series and the differences up to 4°. For the sums of squares 
in the various columns S; corresponding to 4’, we find— 


S,= 107,541 
S= 318,15 
S,- 1,033,513 


S, = 3,445,308 
S, = 11,720,069 
S, = 40,548,844 
To obtain second moments we divide by 51 — j and then, to obtain the estimate of v, 
2j : 
by ( 7 ) We find the following :— 


j Estimate. 

T 1075-41 

2 1082-02 

3 1076-58 

4 1047:21 

5 1011-05 

6 975-20 
Curiously enough, the estimate for j = 2 is higher than that for j = 1 and there is 
little difference between the various estimates. In the ordinary way we should have 
concluded that the systematic component was adequately represented by a polynomial 
of order 1, that is to say a straight line, and that the residual random element had a variance 


of about 1000. 

The reader must not be surprised to find discrepancies of this kind between theory 
and experiment in short series; and the discrepancy is not, in fact, as big as it seems. 
The variance of the original series is 6272-61. The mean square of the first difference, 
divided by 2, is 1075-41, so that about five-sixths of the variance has been eliminated by 
the first differencing, and the method indicates, quite correctly, that the greater part of 
the systematic element is linear. The random element is rather large compared with the 
non-linear systematic terms, and the latter have got caught up in it—the series is too short 
for the variate-difference method to disentangle them. Consider, for instance, the cubic 
term ji, (¢ — 26)*. In the original series this varies in value from — 156-25 to + 156-25. 
First differences reduce it to m (t — 26)*, varying from 18-75 through zero to 18-75, 
whereas the random element is increased in range from 0 to 198. Already the systematic 
_ term is being swamped by the random element, and a slight degree of accidental correlation 
between the two can easily account for the increase in the mean-square of second differences. 

The matter may be put in a slightly different way. Suppose that, relying on the 
variate-difference method, we regarded the data as represented by a linear equation plus 
a random residual. If we fitted a straight line by least squares and examined the residuals, 
we should probably find very little evidence of departure from randomness. This repre- 
sentation would differ from the mode of construction of the series, but it would be a possible 
method of construction. Only the failure of the representation to conform to further 
terms of the series would reveal its weakness. 


aet 
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29.36. The variate-difference method thus provides a kind of lower limit to the 
degree of the polynomial which will represent a series locally or generally. "There remains 
for consideration the question as to what sort of differences between successive estimates 
of v can be regarded as chance effects, in order to decide when the value has reached a 
stationary level. The sum of squares S; is a constant factor times the second moment, 
but as its members are correlated among themselves we cannot use the variance of the 
second moment to test its significance. Further, S; and S,,, are correlated. We proceed 
to derive the sampling variance of their difference, the somewhat complicated formulae 
being due to Anderson (1914). 


29.37. Write 


y-(D e. WG, Pareto (DER 


Then we have, as in (29.42), 


1 

2r 

T 
wheré 4; is the variance of u. Further 


E (Ar u)* = E[ {bo Uppi — br Up + ba Upi — ... + (— 1Y b, w} 


H (bou ua — br Ups F biu, eee +(—1) 5, up 


bd ES 
E (Ar 2 x e =: LAE. 
ed iege gir corey Rss 


= et est a ee 20:00) 


+ (bou, — bius, + ba Uno Ser red en d EZ D EYES + (29.51) 


Consider first of all the terms in this which result in fourth powers of u. They will 
derive from 
Gba tows... Foru + oe tO tw. tae t. i. 
+ bp up +o? uža t+... Houe) 
E (bj (un + Ui) + (b + bi) (uni + ua) + (bo + bp + 83) (oca + 3) +... 
+ (05 + bee) osa + UZ) + (05 + Oi +. . . +82) 
(epee EAT reel eee yee iet Mea) Stee : : : E : . (29.52) 
Writing now 
Bj = (b8)? + (b3 +03) +.. e HOHO +... Ho) + (29.53) 


2 2 2r\? 
4-0. Er OT). Ipod Boe pec 
we see that the term in Z (w‘) is i 
(4j(n —9r) B y E (ut). . . . . (29.55) 


The only other term appearing from (29.51) will be of type E (u? u2,),1 4m. If the reader 
will write out the expansion of (29.51) he will find that the coefficients are expressible in 
terms of 


Aj = (bob; + bibir +... bL)? -( ES ji : . (29.56) 


7 
and 
B5 — (bs bj)? (b, bj +51 06;,1)-- .-. + (bob; +b1 645 +... 557,51 5,51). 9. (29.67) 
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Table 29.9 shows the series and the differences up to 45. For the sums of squares 
in the various columns S; corresponding to 4’, we find— 


S,— 107,541 
S,— 318,115 
S,- 1,033,513 
S,= 3,445,308 
S, = 11,720,069 
S, — 40,548,844 


To obtain second moments we divide by 51 — j and then, to obtain the estimate of v, 
2j A 
by ( 2 We find the following :— 
Estimate. 
1075:41 
1082-02 
1076-58 
1047-21 
1011-05 
975-20 


C» QU iR wre œ 


Curiously enough, the estimate for j — 2 is higher than that for j — 1 and there is 
little difference between the various estimates. In the ordinary way we should have 
concluded that the systematic component was adequately represented by a polynomial 
of order 1, that is to say a straight line, and that the residual random element had a variance 
of about 1000. 

The reader must not be surprised to find discrepancies of this kind between theory 
and experiment in short series; and the discrepancy is not, in fact, as big as it seems. 
The variance of the original series is 6272-61. The mean square of the first difference, 
divided by 2, is 1075-41, so that about five-sixths of the variance has been eliminated by 
the first differencing, and the method indicates, quite correctly, that the greater part of 
the systematic element is linear. The random element is rather large compared with the 
non-linear systematic terms, and the latter have got caught up in it—the series is too short 
for the variate-difference method to disentangle them. Consider, for instance, the cubic 
term 44; (t — 26)*. In the original series this varies in value from — 156-25 to + 156-25, 


7 : , 3 S 
First differences reduce it to 100 (t — 26)?, varying from 18-75 through zero to 18-75, 


whereas the random element is increased in range from 0 to 198. Already the systematic 
term is being swamped by the random element, and a slight degree of accidental correlation 
~ between the two can easily account for the increase in the mean-square of second differences. 

The matter may be put in a slightly different way. Suppose that, relying on the 
variate-difference method, we regarded the data as represented by a linear equation plus 
a random residual. If we fitted a straight line by least squares and examined the residuals, 
we should probably find very little evidence of departure from randomness. This repre- 
sentation would differ from the mode of construction of the series, but it would be a possible 
method of construction. Only the failure of the representation to conform to further 
terms of the series would reveal its weakness. 


x 
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29.36. The variate-difference method thus provides a kind of lower limit to the 
degree of the polynomial which will represent a series locally or generally. "There remains 
for consideration the question as to what sort of differences between successive estimates 
of v can be regarded as chance effects, in order to decide when the value has reached a 
stationary level. The sum of squares S; is a constant factor times the second moment, 
but as its members are correlated among themselves we cannot use the variance of the 
second moment to test its significance. Further, S; and S;,, are correlated. We proceed 
to derive the sampling variance of their difference, the somewhat complicated formulae 
being due to Anderson (1914). 


29.37. Write 
y-(1). I uS IEEE) 
Then we have, as in (29.42), 
2 2 
3 Nd us s cr M eant yu E US 
a 


(Ep...) 
wheré 4; is the variance of u. Further 
E (Ar u)* = E[ {bo uppi — bi Up + ba tpi — «+» +(— 1Y b, uu? 
+ {bo Upo — br Uppi + baup — o e o +Ò Dro us)? 


+ {bo "E zd we buy. —... E (— 1V bu, V). . (29.51) 
Consider first of all the terms in this which result in fourth powers of u. "They will 

derive from 
E bua Houp o.oo Hu Houat blu t ooo dur... 

A N a a ETA 
=E (bj (us + ut) + (b3 + Bf) (una + 3) + (65 + Of + b3) (n-a + 18) H.a 

+ (08 + OF +... By) (uring dw) + (05 0 +... +87) 

UR c2 a c Ce cy aie : : : ` 5 (29.52) 
Writing now 


Be = (62/2 + (OR + OP +... OPH T... +024)? 0. (20.58) 


Dp \2 
Ab—( EB... (zy MEC Eten 


we see that the term in E (u*)is : 
{Aj (n — 2r) + 2B5) E (u*). PTR ETE (20155) 


The only other term appearing from (29.51) will be of type E (uj um), l =m. If the reader 
will write out the expansion of (29.51) he will find that the coefficients are expressible in 


terms of 


2r A 
A? = (b bj E6051. bb) Gs) 4 . (29.56) 
and 
4 


Bj = (bo bj)? + (bo 5; +b: ba)? ee + (095; by Bye, H -e +4, 51 0, 4)*. . (29.57) 
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The expression for E (47 w)* reduces to— 
(n — 2r) A? E (u!) + 4 { (n — 2r + 1) A} + (n — 2r -2) A3... 


p» 
+ A? (n — 2r + r) ) E (uj un) + 2B; E (u*) 
+8 {B} +B} +... +B, + Bi} E (uj ws i ; . (29.58) ' 
f 2r\2 ` 
Substituting u, for E (u*) and p for E (uj uz), dividing by (n — »( ; ) and subtracting 
3, we find the sampling variance of the estimate of v. The expression can, however, be 
simplified to some extent. Putting 
#1 r-2 2 r—3 2 2 
A Gy AC Gea) PEOR 
nA G6) EO GJ "20 Gia | 
eren m T (29.59) | 
0 T 
we find, after lengthy algebraic rearrangement, 
2T 
2,2 r 
€ S, — Ms — 39$ J 1 DN 
2r nh—r (n — r) 
(n — 7) E r » 
4r 
2u2 2r r 
n—r)\/2r\? 2@—n? Late ys : . (29.60) 
(*) | 
If terms of order (n — r) ? can be neglected, this reduces to 
( 4r 
Ja — 30 2r] 9ui : 
wr i iic epee eet et 129.01) 
f 
or, using the Stirling approximation to factorials, 
1 
mr {ua — 33 + u$ /(2rz) }, : . A . (29.62) 
which is a fair approximation to (29.61), being within 3 per cent. for r as low as 6. . 
When the population of values of u is normal, p, — 3u3 vanishes and the formula = 


simplifies accordingly. . 


29.38. In a similar way it may be shown that 
S, Sra 
elana) ee (7 + 3! 
T r+1 
== at TENE zs s 
n —rT : CCG tr) @=r-» 


pe 

2u3 2r 2n — 2r — 1 r+l 

pi cum (a We 2\ R= Pol SAEN) . (29.63) 
r PEL 
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where 
ge 2 2 m 2 2 2 2 
nA Gas) 20 a eo 
TRAC umso Qro) tuos ceto) ST 
From (29.60) and (29.63) we can determine the variance of the difference of 


8, 8, 
rM nd r+1 3 
m—n(7) Gee D(a) 
r r+l 


The general formula is complicated, but for normal variation, large n and r > 6 we have, 
analogously to (29.62), 


S, Sra 
var 2r 2r -- 2 
(n »(7) (n—r D) 
S 1 
_ (3r + 1)v/(2ar) — — 
2(r + l (n —r—1)) (n Mee 2529,64) 
The arithmetic application of the formulae has been facilitated by the preparation of tables 


of the constants involved. Reference may be made to Tintner (1940) who gives tables 
prepared by himself, Anderson and Zaycoff. 


Example 29.7 


For the data of Table 29.3 (sheep population) an application of the variate-difference 
method up to the tenth difference gave the following results :— 


&/(7) Baa) 


3468 
1442 
854 
629 
518 
448 
401 
371 
357 
10 347 


DONDAN 


The values here are falling steadily from r = 1 to r = 10, but very slightly towards 
the end. From (29.64) for r = 6 we have for the variance of the difference, 80-7 approxi- 
mately and for r = 10, 25-8 approximately. It appears that the reduction in variance 
at r = 10 is losing significance, and that a moving are of degree 10 would be sufficient to 
eliminate the systematic component. It does not, of course, follow that the trend-line 
must be of this degree, for we may not want to eliminate the oscillatory movements in 
the trend-line. 


29.39. The variate-difference method will clearly not eliminate systematic effects 
such as periodic terms with very short period. Consider, for instance, the series 1, — 1, 
1, — 1, etc, The first differences give us a series 2, — 2, 2, — 2, etc., second differences 
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4, — 4, 4, — 4, etc., and so on. The variance of the series of rth differences is, neglecting 
effects due to the shortness of the series, 2?" times that of the original, and the quotient 


when this is divided by G S) tends to 


2r (r 1)2 
pose Uh) (Uy — Var 
(2r !) 
and so increases without limit. In such a case we cannot obtain an estimate of the variance 
of any random element which may be present. 


NOTES AND REFERENCES 


References to the fitting of polynomials are given at the end of Chapter 22. For the 
moving average see Whittaker and Robinson’s Calculus of Observations and the books by 
Macaulay (1931) and Sasuly (1934). 

Attempts have been made to use trend-lines for purposes of forecasting, and even to 
measure the standard error of a forecast—see Schultz (1930) and a discussion in Davis 
(1941). The methods proposed appear to me theoretically unsound and in practice they 
lead as a rule to such wide limits of error as to be of doubtful value ; but this is a personal 
opinion and the less sceptical reader may care to consult Davis's book and to follow up 
the references given therein. 

For the effect of moving averages on random variables see Yule (1921) and Slutzky 
(19375), the latter being an English version of a paper published in Russian many years 
earlier. See also Dodd (1939a, 1941a). Slutzky proves an interesting theorem—the 
theorem of the sinusoidal limit—to the effect that repeated moving averages of certain 
kinds applied to random series generate a sine-curve. 

For the variate-difference method see the book by Tintner (1940), a very thorough 
practical account with useful tables. The more important earlier memoirs are those by 
Anderson (1914, 1923, 1926), “Student” (1914), Morant (1921), and K. Pearson and 
Cave (1914). 


EXERCISES 


29.1. Show that in the formulae of equation (29.7) and similar formulae of higher 
orders the sum of the weights is unity. 


29.2. By evaluating the solutions of (29.5) determinantally show that a parabolic 
curve of second or third order giving a graduation 
Gy Ui T O44) Uu- e >- F aoto +... F Qt 
has 
3n? + (3n — 1) — 57 


“4 78i - 1) 2n. + 1) @n +3) 


29.3. Show that the weights in the Spencer 21-point formula are 
il 
350 
and that if it is applied to a random series the variance of the resultant is about one-seventh 


Celio 2 12,6, 18, 335641 DT, 60:9. see] 


w l 
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of the original series—about the same reduction as would be given by a simple moving 
average of sevens. 


29.4. Show that Macaulay's 43-point formula, 
1 7 
gag 123 (81 51 | v5 =1,0, 0, 00,004, 2.21, 
has weights 
1 
5600 [7 18. 30, 40, 45, 28, — 8, — 60, — 122, — 178, — 205, — 190, — 127, 
— 6, 163, 360, 562, 760, 928, 1050, 1127, 1156, . . .] 

and that it reduces the variance of a random series about as much as a simple average 
of nines. 


29.5. Take a random series of, say, 200 terms and determine “ trends " by moving 
H 


729 
uperosses with the theoretical values based on normal theory. 


averages Ho » [9]? and [9]. Compare the mean distances between peaks and 


29.6. If ¢, is a random series, show that the correlation between successive members 


of A* e, for long series is — and hence tends to — 1 as b increases. Hence show 


k 
k+l 
that the signs of successive terms in A* wu, tend to alternate, where «, is the sum of a random 
element and a systematic element representable by a polynomial; and verify by reference 


to Table 29.9. 


29.7. By eliminating ô? from (29.19) show that, for a cubic curve, an accurate trend- 


line is given by 
1 h?—1 k*—1 
I Š E h 
[5m -triw) 


and generalise this result. 
(Cf. J. A. Higham, J. Inst. Act. (1882-5), 23, 335; 25, 15, 245.) 


y 
CHAPTER 30 


TIME-SERIES—(2) 


30.1. The present chapter is devoted to a discussion of oscillatory effects in time- 
series. We shall suppose that our series is stationary, i.e. has no trend, either because the 
original data contained none or because trend has been removed by one of the methods 
described in the last chapter. Our typical series will then fluctuate round some constant 
value which we may usually, without loss of generality, take to be zero. We shall assume 
that there is a prior possibility that part of the variation at least is random. This, indeed, 


TABLE 30.1 


Trend-free Wheat-Price Index (European Prices) compiled by Sir William Beveridge for 
the Years 1500-1869. 


(From Beveridge, 1921.) Y 
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is necessary if our results are to have any practical application, for most of the series 
encountered in practice have some element of irregularity, however small. 


30.2. Four examples of the type of series under consideration have already occurred. 
"The table of Example 21.11 (page 126) gives the deviations from a simple nine-year moving 
average of the yields of potatoes in tenths of tons per acre in England and Wales for the 
years 1888-1935. Table 29.1 (Fig. 29.1) gives the annual yields of barley in ewts. per 
acre in England and Wales for 1884-1939, no nine-year elimination of trend having been 
carried: out in this case. Table 29.4 (Fig. 29.4) gives rainfall data at London over the 
century 1813-1912. Table 29.5 (Fig. 29.5) gives egg-production per laying hen in the 
U.S.A. 


TABLE 30.2 


Marriage Rate in England and Wales: Deviation from a Simple 11-Year Moving Average 
for the Years 1843-1896. 


Units 1 in 10,000. 


| 

7 | Marriage - Marriage r Marriage 

ear: | are Mest. Baten Years. 4 Rate. 

1813 | —6 1861 — 6 1879 — 12 
44 1 62 -7 80 - 6 
45 12 63 1 8L 0 
46 10 64 | 6 82 5 
47 —6 65 8 83 7 
48 —8 66 9 84 3 
49 —6 67 — 2 85 — 4 
50 3 68 — 8 86 — 8 
51 4 69 — 10 87 — 6 
52 7 70 - 7 88 - 5 
53 11 71 0 89 1 
54 3 72 8 90 6 
55 —8 73 12 91 6 
56 Rx p 74 7 92 2 
57 —3 75 5 93 — 6 
58 =T 76 + 94 - 5 
59 3 71 — 3 95 — 6 
60 4 78 — 6 96 1 


Tables 30.1 and 30.2 give two further examples. The first is a famous series of trend- 
free wheat-price indices compiled by Sir William Beveridge and extending over 370 years, 
a phenomenal length of time for economic series. The second is the deviation from a 
simple ll-year moving average of marriage rates for the years 1843-1896. 


Oscillation and. Cycle 

30.3. We will now attempt to define more closely the sense in which we use the 
words “ oscillation " and **eycle ". It is particularly important to exercise great care in 
the use of an aecurate nomenclature because a great deal of the literature on this subject 


suffers from confusion due to loose wording. 
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By a cyclical component of a time-series we shall mean one which is a strictly periodic 
function of the time, that is to say, for which there exists a period w such that 


(patie =e 5 5 Sas So E TR : ` . (30.1) 
whatever the value of t. The periodic functions which we shall consider in particular are 
the sine and cosine functions. If the series can be represented as the sum of a cyclical 
component and a random constituent, or by a cyclical component alone, we may speak 
of it as a cyclical series. 


30.4. If the series is not random it must move with more or less regularity about 
the mean value, and we shall then speak of it as oscillatory. The oscillatory movement 
may be in part due to random elements but must not be entirely so, A cyclical series is 
oscillatory, but an oscillatory series is not necessarily cyclical. 

An oscillatory movement may be the sum of two or more cyclical components. Con- 
sider, for instance, the sum of two periodic terms 


; 2m . 2nd 
U; = sm — + sin . 
[271 De 


If w, and w, are commensurable there will be numbers, and in particular a smallest number 
€, which is an exact multiple of both of them. This is clearly a period of the series. 
But if w, and œw, are not commensurable there will be no period of this kind and the sum 
will be oscillatory but not cyclical. 


30.5. It may be felt by the reader that we could reasonably extend the use of the 
word “cyclical” to cover series which are the sum of cyclical terms; but the danger of 
doing so is that within certain limits any series can be represented as a sum of harmonic 
terms, even if it is not itself oscillatory, in virtue of Fourier’s theorem. Admittedly such 
a representation, to be exact, must in general consist of an infinite series of terms and is 
valid only in a certain range, but in practice a comparatively small number of terms often 
gives quite a good approximation. We do not call a function a polynomial because it 
can be expanded in powers of the variable by Taylor’s theorem; and correspondingly 
we shall not call it cyclical because it can be expanded as a sum of harmonic terms by 
Fouriers theorem. On the whole it seems safer to avoid the word “ cyclical ” for series 
which consist of a finite number of cyclical terms. 


30.6. For our present purposes the main significance of the distinction we are attempt- 
ing to make is that in a cyclical series the maxima and minima, apart from disturbances 
due to the superposition of a random element, occur at equal intervals of time and are 
therefore predictable for a long way into the future—for so long, in fact, as the constitution 
of the system remains unchanged. In oscillatory series, on the other hand, the distances 
from peak to peak, trough to trough or upeross to upcross, are not equal, but vary very 
considerably. Similarly, in the oscillatory series the amplitudes of the movements may 
vary very substantially, whereas in a cyclical series they should be constant (again, except 
in so far as superposed random elements disturb them), 


30.7. Now the time-series observed in practice are very rarely cyclical as we have 
defined the term. The only case among those cited at the beginning of the chapter in which 
there appears to be any cyclical movement is that of egg-production per hen in Table 29.5. 

_The far more usual case is that of varying amplitude and period from peak to peak or upeross 


ae 
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to upeross. We shall therefore begin our study of oscillatory movements by considering 
the kinds of scheme which can give rise to the observed phenomena; and then we shall 
examine methods of deciding which of the possible schemes should be chosen as the 
hypothetical representation in particular cases. 


Tests for Randomness 


30.8. The first stage, when confronted with a fluctuating stationary series, is to 
examine whether the fluctuations are purely random. ‘Tests of randomness are easy to 
find, and in fact the random.series is the happy hunting-ground of the worker whose interests 
lie mainly in the mathematics of the direct theory of probability. We have considered 
some tests which are appropriate to the study of oscillatory movement in 21.43 to 21.46. 
Others which have gained popularity are based on the distribution of “ runs ” and on the 
correlation between successive members of the series. The reader will have no difficulty 
in composing others. All these tests are based on the non-parametric case, so that the 
alternative hypotheses are not usually brought specifically into view. We cannot there- 
fore apply the general theory of Chapters 26 and 27 to determine “ best " tests, and in the 
present state of knowledge are forced to be content with less definite ideas. So far as 
ease of application goes, the tests of 21.43 and 21.44 seem to have decided advantages, 
though they may be somewhat insensitive. The method of serial correlation, to which we 
refer below, gives a useful alternative in doubtful cases. In the sequel we shall suppose 
that before proceeding to search for systematic movements we have satisfied ourselves by 
one or more of these tests that such movements exist. 


30.9. We shall consider three schemes which can account for the typical oscillatory 
movement usually observed. - 

(a) Moving Averages.—We have already seen in Chapter 29 that a moving average 
of a purely random element can generate an oscillatory series with all the required properties 
of varying amplitude and mean distances—the Slutzky-Yule effect (29.25). Fig. 29.6 
illustrates the kind of oscillation which may arise. It is at least possible that some of the 
observed oscillations in time-series may be generated in this way ; and in fact Slutzky 
(1936) has given an interesting example in which a part of his series generated by the 
moving average happens to agree very closely with an observed series. 

(b) Sums of Cyclical Components.—We may attempt, by Fourier analysis or the more 
general harmonie analysis, to represent the oscillations as the sum of a number of cyclical 
components. This is the classical approach. 

(c) Autoregression Equations.—If a series is constructed by the recurrence formula 


Uyi =f (Up Uii sso tk) F Sep < E . (30.2) 
where f is a mathematical function and e a “ disturbance " function which may be a random 
variable, then under certain conditions the generated series is of the required type. We 
shall consider in particular the series 

Uya = — Ay — Du, + Ergo, : ; 3 .. (30.3) 


where a and b are constants and e is random. 

Table.30.3 (Fig. 30.1) shows a series of type (b) in the simplest case where only one 
cyclical component is involved, together with a random residual. Table 30.4 (Fig. 30.2) 
shows an autoregressive series constructed from random numbers by the formula 


Ups —V1u,,—0:5 u, + 845. S 5 5 . (30.4) 
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TABLE 30.3 


Values of the Series u, — 10 sin + & where e, is a Rectangular Random 
Range — 5 to + à, rounded off to Nearest Unit. 


Variable with 


Buber of Series. ee of Series. aie se) pr Series. 
1 3 21 ll 41 | 5 
2 8 22 13 42 12 
3 6 23 10 43 T 
4 2 24 6 44 5 
5 —- 4 25 —- 5 45 3 
6 — 7 26 — 8 46 - 2 
7 - 9 27 — 12 47 — 12 
8 —-9 28 — 10 48 — 12 
9 — 10 29 =- 7 49 — 8 
10 - 1 30 0 50 - 1 
1l 8 31 1 51 11 
12 7 32 8 52 13 
13 6 33 13 53 12 
14 4 34 7 54 | 7 
15 - 3 35 4 55 5 
16 — 10 36 - 9 56 = 1 
17 — 11 37 | = 9 57 — 6 
18 — 15 38 —- 6 58 — 14 
19 — 4 39 | — 4 59 = 8 
20 4 40 | — 2 60 1 
15 - 
10 
E 
i: 5 
[2] 
s 
3 5/0 
ASB) 
E 
EJO 
-15 


Fic. 30.1.—Graph of the Values of Table 30.3, 
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TABLE 30.4 


Values of Series wyo = lY uj,4 — 05% + e449 where epo is a Rectangular Random 
Variable with Range — 9-5 to 9:5, rounded off to Nearest Unit. 


Number Value of Number Value of Number Value of 
of Term. Series. of Term. Series. of Term. Series. 
1 7 23 — 4 45 — 13 
2 6 24 UN . 46 1 3 
3 — 6 25 — 9 47 6 
4 — 4 26 — 4 48 4 
5 3 27 — 4 49 11 
6 — 4 28 3 50 | 15 
7 — 6 29 9 51 9 
8 — 1 30 4 52 8 
9 10 31 — 8 53 4 
10 10 32 — 6 54 — 1 
11 6 33 — 3 55 4 
12 — 4 34 — 2 56 7 
13 — 4 35 0 57 ll 
14 - 7 36 — 1 58 0 
15 | — 2 37 — 3 59 1 
16 6 38 | 3 60 0 
17 17 39 —.1 61 — 6 
18 24 40 — 8 62 — 11 
19 17 4l — 3 63 — 8 
20 4 42 — 8 64 — 3 
21 1 43 — 10 5 
22 e 44 — 16 


20 


~ 
ISI 


o 


Values of Series. 


' 
ES 
o 


-20 
Fic. 30.2.—Graph of the Values of Table 30.4. 
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“30.10. Tt is quite possible that theoretical reasons may suggest other schemes for 
study as the subject progresses. For instance, we might wish to consider series defined 
by differential equations, on the analogy of the similar equations determining oscillations 
in physieal phenomena such as vibrating strings or electrical discharges. Something has, 
in fact, already been done in this direction, We shall, however, confine our attention 
to the three schemes indicated above, and particularly the second and third. 


30.11. On the face of it, an observed series exhibiting the typical movements in 
amplitude.and period might be due to any one of the three schemes or even to a combination 
of them. We require, in the first instance, some objective criterion for deciding which, of 
them is applicable in particular cases. Inspection of the primary data, though useful, is 
quite an unreliable guide in making a decision on this point, particularly if the series 
is short. Experience seems to indicate that few things are more likely to mislead in the 
theory of oscillatory series than attempts to determine the nature of the oscillatory move- 
ment by mere contemplation of the series itself; and yet this is the method, if one can 
dignify it by such a term, which has perhaps been most widely used in the past. 


J Serial. Correlation 
30.12. Suppose our series of values is u, . . . Up. Let us form the product-moment 
correlation coefficient between successive terms, i.e. 
— COV (us, Uj 41) 
* (varu var uj)? 
There will be (x — 1) pairs entering into the correlation, and the variances of u; and u;,, 
differ only in the fact that the first relates to the terms t1, Us . . . U,—1 and the second 
to the terms Us, ws, .. . Up. The coefficient r, is called the serial correlation coefficient 
of the first order, or more briefly the first serial correlation.* 
More generally, let us define a coefficient of order 5: 
COV (Uj, t) 
Gary var ti 


1 n—k 1 n—k n-k 
em 2.0 Uj) — n — Ejà (S's) (Sion) 
1 


2200205) 


tet 6 SSP E 


=1 j-1 
= 1 n—k E n—k 231 n—k : 1 n—k su (30.7) 
xA m=)? (3 «) EUN ~ a=k} e» a) } 
By convention we define 
ro =1 
rp m : . . E . . (30.8) 


30.13. In practice we often require to caleulate serial correlations up to r and for 
long series as many as 60. The arithmetic is tedious but may be systematised so as to 
reduce labour, which arises chiefly in the determination of cross-products forming the 
covariances. 

The series of n terms is written down vertically on each of two slips of paper, the spacing 
being equal on the two slips. This can very conveniently be done on a Burroughs tabulator 
with a split keyboard, the series being recorded in duplicate and the resulting strip cut up 


* It is sometimes convenient to confine this expression to values calculated from samples, the 
corresponding values for the infinite series being termed “ autocorrelations " and denoted by a Greek p. 


n 
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the middle. To calculate the first produet-sum we pin the slips so that the first term 
on the right-hand slip is opposite the second term on the left-hand slip, and hence so that 
the jth term on the right is opposite to the (j + 1)th on the left all the way down. For 
most series the differences of two terms which are opposite can be obtained mentally by 
subtraction, squared, and set up on an adding-machine. The sum of squares of differences 
is thus determined, and the cross-product found from the simple identity 
22 (XY) =2 (X*) + E(Y? — Z(X — Y). 

We then move the right-hand slip down one space so that the jth term is opposite the 
(j+ 2)th term on the left and repeat the process; and so on to as many terms as may 
be required. 

In this process X (X?) and X (Y?) are required at each stage, and it is as well to deter- 
mine them by cumulative summation from the two ends of the series. X (X) and X (Y) 
are also required. It is also convenient on occasion to reduce the series to zero mean 
approximately before beginning the analysis. 


Example 30.1 
To illustrate the arithmetic we will take a very trivial example which the reader should 
check for himself. Take the series > 
— 5, — 6, — 2, 4, 7, 3, 1, — 5, —1, 2. 
We set up the following scheme of tabulation for calculating serial correlations up to the 
fifth order :— 


SAB) Aea 2 (5) zn) : 
^ — k. | k. | (from beginning | (from end (from (from end) Z(X — Y)9*| Z(XY). 
of series). , of series). | beginning). : 
10 0 —2 —2 170 170 4 0 170 
9 1 —4 3 166 145 143 84 
8 2 -3 9 165 109 344 — 35 
7 3 2 il 140 105 445 — 100 
6 4 1 $ 139 89 380 — 76 
5 5 —2 0 130 40 172 — 1 


The number n — k is the number of pairs entering into the kth correlation. X (X) is the 
sum of n — k terms beginning at the first term, X (Y) the corresponding sum of the last 
n — k terms, and similarly for X (X?) and Z(Y?) These are the quantities required to 
calculate the variances entering into the denominator of the kth serial correlation, The 
quantities X (X — Y)? are calculated by the moving-slip method described above. 

We now calculate the correlation coefficients in the usual way, e.g. for ry ' 


166 4? 
eere 18:247, 
var X 9 ( s) 
E 2 
VERD c. 3 — 16-000 
9 9 
4 
cov (X, n-$-(-$)($) - vss 
P 5 
n 9-4815 L- 0-55; 
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and for rs 
130 2\2 
= — — | — =) = 25-840 
var X 5 ( 3 
40 0\2 
=> — |- ) =8-000 
var Y 5 (5) 
1 2N/0 
= =}( =) = — 0-200 
cov (X, Y) 5 ( si(a) 
2 rs = — 0-01. 
When 7 is large and the origin is chosen so that the mean of the whole series is approxi- 
AR E(XY ; 
mately zero, a sufficiently good value of r is given by DUE TIU the corrections 


required to adjust the sums of squares and products to values about the mean being small ; 
but this approximation must be used with some care and in any case the first two or three 
serial coefficients should be worked out exactly. 


~The Correlogram 
30.14. The diagram obtained by graphing r, as ordinate against b as abscissa and 


joining the points each to the next is called a correlogram. We shall give a number of 


examples below and shall see that the form of the correlogram provides a method of dis- 
criminating between the various types of oscillatory series. 


30.15. Suppose, for example, that the series is generated by a moving average of 
random elements with weights a,, az, . . . am. The typical term of the series is then 


Ug Oy Bp 4653 V s an epe, S : + (30.9) 
Without loss of genevality we may take E (e) = 0 and hence E (u;) — 0. Then 
E (w wr) — E (ae t ase, +... 4 Gm Ejtm-1} 


: laeua dd Gm 8) km 1] - 
Since 
E (6644) = 0, k #0 
=v, say, if k =0 
we have 


E (uj W554) = (0,444 + aa Gere h 2. Fanen) o + (30.10) 
provided that m> k. But if k >m then 


E (uj uj,4) = 0. t i : : . (30.11) 
"Thus for an infinite series generated by the moving average the serial correlations vanish 
for b >m, and the correlogram from that point onwards coincides with the a-axis. In 
particular, if the a’s are all equal to 1/m, we have 


v 
E (uj wjr) = (m — E) m? 
and hence 
n 1-4 dae MENT 5015 


so that the correlogram consists of a straight line joining the point (0, 1) to (k, 0), together 
with the z-axis from the latter point onwards, : 


= 


o 
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Example 30.2 


The weights of the Spencer 21-point formula are 
1 
350 í 
Apart from the divisor 350, which may be disregarded for present purposes, the sum of 


squares of weights is 17,542. The products (30.10) and the corresponding serial correlations 
are as follows :— 


T 3, — 5, 5, — 2, 6, 18, 33, 47, 57, 60, . . .}. 


k. Z aj ax. Tk. k: | Z dj aj. xx. The 

0 17,542 1-000 11 — 930 — 0:053 
1 16,786 0-957 12 — 528 — 0-030 
2 14,667 0-836 13 — 214 — 0:012 
3 11,584 0-660 14 — 27 — 0:002 
4 8,085 0-461 15 50 0-003 
5 4,726 :0:269 16 59 0-003 
6 1,951 0-111 17 40 0-002 
if 6 0-000 18 19 , 0-001 
8 — 1,074 — 0:061 19 6 0:000 
9 — 1,430 — 0-082 20 1 0-000 
10 — 1,298 — 0-074 21 0 0-000 


ju 


Values of x 


Fic. 30.3.— Correlogram of Series generated by the Spencer 21-point Formula (Example 30.2). 


The correlogram is shown in Fig. 30.3. From & = 13 onwards the correlations are very 
small, and from k = 21 onwards they vanish completely. 
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30.16. Suppose now that the series consists of a sine term A sin ĝt plus ej, a random 
residual. As before, we may suppose Z (w) = 0, and hence 


E (uj uj4,) = E (A sin0j + &) (A sin 0 (j + k) + 544) 
= A? E (sin 6j sin 0 (j + k) } 
TRUM MUR : 30.13) 
=> {sin 6j sin 0 (j + k) } E : = Nx 


= 2 y (cos Ok — cos 0 (2j + &) ) 
n 


=< Ae o ok A? cos 0 (k +n + 1)sin no .. (30.14) 
2 2n sin 0 
Thus for large n we have effectively, unless 0 is small, 
E (uj uj.) = a cos 0k = B cos 0k, say. 5 A . (30.15) 
Similarly we find 
E (u$) = B + var e = C, say. ; 3 A . (30.16) 
Hence 
‘= B cos OL, kx: 0. r ; . (30.17) 


[n 


In short, for an infinite cyclical series the correlogram itself is a harmonie with period 
equal to that of the original harmonie component. 


30.17. When the original series is the sum of several harmonie terms the formula 
for r, will, in general, be the sum of harmonics, not necessarily with the same periods. 
Thus the correlogram will present a sinusoidal form which will not degenerate to the x-axis 
after some fixed point and will not, in fact, be damped. 


30.18. Consider now the series defined by (30.3), namely 


Uta = — My, — Du, + Ero- 


This is a difference equation which is easily solved by the usual methods.* The general 
solution of 


y Wye + auy + by — 0. 5 : E . (30.18) 
18 
t, = p' (A cos 0t + B sin 0t) : ; r . (30.19) 
where p=vb 
Con eem m | . (30.20) 
2b 


Here 4/b is to be taken with positive sign, and it is assumed that 4b — a?. We also assume * 
that 4/b is not greater than unity. The contrary case is mathematically permissible, but 
it implies that v, increases without limit, which is outside the domain of our consideration. 


* See, for instance; Milne-Thomson, Calculus of Finite Differences, chapter 13. 
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Consider now the series 


2.5 ue Nice teed meter (30-21) 
7=0 
where £, is a particular solution of (30.19) such that £, = 0 and £, = 1, i.e. such that 
sin 6t. A 5 * . (30.22 
H Sap e ( ) 


On substituting (30.21) in the original equation it will be found to „provide a particular 
solution. The general solution is then 


&= 


u = p' (A cos 0t + B sin 0t) + 2,5 Desi r6 : . (30.23) 
j—0 


As p is not greater than unity we shall, in general, find that the first term in this expression 
is damped out of existence. If we may regard our series as having been “ started up” 
some time prior to the point ¢ = 0, the solution is effectively 


DR INC cO MM. cepa 
j=0 


30.19. In this form the autoregressive scheme is seen to be a moving average of 
a component e with infinite extent and damped harmonic weights. Consider now its 


correlogram. We have 


2/557 gg 
j=0 s 


2P" | 5 [p*i{ cos Ok — cos 0 (2j + k 
= a Pg Z (p! (eos 0k — cos 0 (2 + E) )] 


at (p*/** sin 0j sin 0 (j + k) } 


4b — 
aun cos0k ^ cos 0k — p? cos 0 (k — 2) 
~ 4b — a? £ — p? 1—2p?cos26-+p* J  ' Gru 
Now 
E (uj uj,4) = E {Z (E; 55543) © ($; 6-543) } 
$2 J 
avare >) Gu. 0. + 0o 0s (80.26) 
j=0 
Thus 
vare a E Ban) 
IS E ——— — 
vare H 
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which, on substitution from (30.25), reduces to 


Th = ax man (k + 1) 0 — p? sin (k — 1) 6}. c . (30.27) 

Writing 
tan p c LÀ P tang, WE y ECCE T E (90.08) 

` we find 
2, d PLC, ics ON 9 2 ERE (80.20) 


From this we see that the correlogram will oscillate with period 27/0, but that, owing to 
the factor p^, it will be damped. If b is negative the formula applies, except that |+ | 
must be used instead of b on the right-hand side of (30.29). 


30.20. We thus reach the interesting conclusion that the three types of series con- 
sidered in 30.9, however similar to the eye, will have distinct types of correlogram, pro- 
vided that the series are long enough for the observed correlations to approach the expected 
values for an infinite series. The correlogram of a series generated by moving averages, 
though it may oscillate as in Example 30.2, will vanish after a certain point; that of à 
series of harmonic terms will oscillate, but will not vanish or be damped ; that of the auto- 
regressive scheme will oscillate and will not vanish, but it will be damped. The correlogram 
therefore offers a theoretical basis for discriminating between the three types of oscillatory 
series, 


30.21. Unfortunately the series with which we have to work are very frequently 
too short to enable a decisive distinction to be made. We shall see below that divergence 
between theory and observation can be very considerable, and that sampling theory has 
not yet advanced far enough to enable us to make objective judgments in probability 
about its significance. We shall have to rely on limited experimental evidence and to 
some extent on intuitive judgment in reaching conclusions. If, therefore, the remainder 
of this chapter contains gaps in the treatment and leaves certain points undecided the 
reader will understand that the reason is ignorance rather than indifference. 


Examples of Correlograms from Observed Series 


30.22. We will in the first place give the correlograms of a few of the series given 
earlier in this and the preceding chapter. 


Example 30.3 


In Table 30.2 we gave the deviations from the trend of marriage rates for the years 
1843-1896. The first 20 serial correlations of this series are shown in Table 30.5 and the 
correlogram in Fig. 30.4. 
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TABLE 30.5 
Serial Correlations of the Marriage Data of Table 30.2. 


E« 
d Order of Order of 
Correlation Tg Correlation The 
i 
1 0-563 11 — 0-080 
2 — 0-089 12 — 0-136 
3 — 0-498 13 — 0:132 
4 — 0-631 14 — 0:058 
5 — 0-467 15 — 0:095 
6 — 0-025 16 — 0:126 
7 0-353 17 — 0:036 
8 0-396 18 0-131 
9 0-254 19 0:209 
10 0-104 20 0:205 
{ 
1:0 
08 Pal lis 
06 
044 
= 
G 02 
o 
N 
D 
S 2 10 15 20 
S -02 Values of k. 
-04 I———3————-— 
1 -06 [x 
l 
-0-8 iE 
=10 


Fic. 30.4.—Correlogram of Marriage Data of Table 30.2 (Table 30.5.). 


The correlogram is smooth and suggests the operation of an autoregressive scheme. 
There is little indication that a moving average, at least of extent less than 20, would account 
for the series, but on the other hand some damping appears to be present. 


Example 30.4 
Table 30.6 shows the first 60 serial correlations of the Beveridge series of Table 30.1, 


the correlogram being given in Fig. 30.5. 
ag 
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TABLE 30.6 
Serial Correlations of the Beveridge Wheat-Price Index of Table 30.1. 
Order of 
Correlation Te k. Te k. Te k. Te 
1 0-562 16 0-158 31 0-060 46 — 0-036 
2 0-103 17 0-109 32 — 0-008 47 — 0:013 
3 — 0:075 18 0-002 33 — 0-039 48 0-042 
4 — 0-092 19 — 0-075 34 0-007 49 0:062 
5 — 0-082 20 — 0-062 35 0-056 50 0:065 
6 — 0-136 21 — 0:021 36 0-010 51 0-050 
7 — 0:211 22 — 0-062 37 — 0:004 52 0-009 
8 — 0-261 23 — 0-088 38 — 0:015 53 
9 — 0-192 24 — 0-084 39 — 0-047 54 
10 — 0:070 25 — 0-076 40 — 0:047 55 — 0:073 
11 — 0-003 26 — 0-091 41 0-008 56 — 0:106 
12 — 0-015 27 — 0:052 42 0-034 57 — 0-084 
13 . — 0:012 28 — 0:032 4 0:065 58 — 0-019 
14 0-047 29 — 0-012 44 0-099 59 | 0:003 
15 0-101 30 0-059 45 0-009 60 0-010 


Values of Ty- 


Aue GUN SUE 


Fic. 30.5.—Correlogram of the Beveridge Series of Table 30.1 (Table 30.6). 


The correlogram here is almost certainly damped. The oscillations persist in a most 
remarkable way, notwithstanding the diminishing amplitude, and the presumption is 
a strong one that the series is of the damped type. 
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Example 30.5 

In Table 29.8 (page 386) we gave the residuals of a sheep-population series for the 
years 1871 to 1935. Table 30.7 shows the first 30 serial correlations of this series and 
Fig. 30.6 the correlogram. Again the correlogram is oscillatory, but the damping is not 
so clear. 


TABLE 30.7 
Serial Correlations of the Sheep Data of Table 29.8. 
ps 73 E | 
Order of 
Correlation Te k. Ty. k. Ty 

k. 

1 0-595 1l — 0-142 21 — 0:381 
2 — 0-151 12 — 0-172 22 — 0-118 
3 — 0-601 13 — 0-186 23 0:173 
4 — 0:537 14 — 0:128 24 0:343 
5 — 0-138 15 0-052 25 0:352 
6 0-144 16 0-276 26 0:154 
7 0-203 17 0-439 27 — 0-203 
8 0-118 18 0-293 28 — 0:456 
9 0-006 19 — 0-074 29 — 0-415 
10 — 0-078 20 — 0-359 30 — 0:184 


of Ty. 


Values o 


USO 
Fra. 30.6.—Correlogram of the Sheep Population Data of Table 29.8 (Table 30.7.) 
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“Significance of a Correlogram 

30.23. The foregoing examples illustrate one of the main difficulties we have to face 
in correlogram analysis. On intuitive grounds we seem to be justified in rejecting the 
scheme of moving averages as a possible scheme for the series of these examples, since the 
oscillations in the correlograms persist ; but we can no doubt find moving averages which 
will produce such correlograms, though their extents would have to be long (over 60 in 
the case of the Beveridge series) and their weights artificial. The only final test Seems to 
be to ascertain such a moving average and then to examine whether it will predict further 
terms in the series if such can be observed. 


30.24. Distinction between the scheme of harmonic components and the auto- 
regressive scheme is even more difficult for short series, since the correlograms for the 
latter do not damp out according to expectation. Consider in fact an autoregressive 
scheme of the simple linear type (30.3). There will be the usual variation in length from 
peak to peak and in amplitude; but if the section of the series is a comparatively short 
one, covering, say, four or five oscillations, the oscillations will not have time to get very 
much out of step and the serial correlations will be systematically larger than one would 
expect for an infinite series. This effect is exhibited in Table 30.8 and Fig. 30.7, which 
give the serial correlations and the correlogram for the series of Table 30.4, given by the 

formula 

Ayo = ll Uy, — 0:5 u, + Erpa 


Here the damping factor p = yb = 0-7071, and by the thirtieth correlation rą should be 
very small, less than 0-002 in absolute magnitude. Actually it is 100 times as large. The 
mere fact that an observed correlogram for a short series fails to damp very rapidly is 
not, therefore, a very definite indication that the series is not ruled by the autoregressive 
scheme. On the contrary, failure to damp may be expected. 


30.25. We are on firmer ground when considering the significance of a correlogram 
in the sense of judging whether it can be derived from a random series. 


provided 


(a) The variance of rą in a random series of n terms is approximately l P 
: ME 


that n is large. For 
1 n-k 2 1 
zi; 5 RE z = CEU {2 t Uap + 22 tj Upp Lm mtk) j#m 
i= 


Peel 
(w= bP 


EX (a 23,3) 


Hence, for large samples, 
l var? a 1 


varr = = ; 
n —kvar'y n—k 


Eae." (80.80) 


Tb Anderson (1942) has recently given exact results for the significance of a serial 
correlation. S 
(b) For our purposes, however, the important point is not wheth i i 
) For our | d er a particula: l 
coefficient is significant, but whether the oscillatory character of the EIS En ose cum 


i 
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TABLE 30.8 
Serial Correlations of the Artificial Series of Table 30.4. 
Order of | 
Correlation | T. k. | Te k. Te 
k. | 
1 0-70 11 — 0-05 21 0-05 
2 0-29 12 — 0-17; 22 — 0:12 
3 0-01 13 — 0:27 23 — 0:28 
4 — 017 14 | — 0:31 24 — 0:43 
5 | — 0:27 15 | — 0:30 25 — 0:57 
6 — 0:25 16 — 0:18 26 — 0:56 
7 — 0:13 17 | 0-12 27 — 0-26 
8 0-07 18 | 0-29 28 0-02 
9 0:12 19 | 0:33 29 0-17 
10 0-05 20 0:22 30 0-27 


z- 1^0 n 
Fic. 30.7.—Correlogram of the Artificial Series of Table 30.4 (Table 30.8.). 


is so. Here we have to form an intuitive judgment, but it can hardly be doubted that 
the undulations in Figs. 30.4 to 30.6 are not accidental. Something exists to be explained 
as a systematic effect, though what that effect is may be more difficult to decide. 


30.26. We shall proceed to study the autoregressive scheme and the scheme of 
eyclical components in more detail, without prejudice for the time being to the question 
as to which is the better representation in particular cases. This latter is not, in fact, 
entirely a statistical matter, and we shall return to it in 30.39. 
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The Autoregressive Scheme 


30.27. We consider in the first instance the simplified scheme of equation (30.3). 
The theoretical correlogram for a series generated by this equation is of the damped type 
given by (30.29), 

. pFsin (kð + y) 
CS ET ee 
sin y 
where 22/0 is the autoregressive period of the regression equation and is given by 


nem a 
cos 9 = — RE 

The typical series of this kind has no “ period ” in the strict sense. The lengths from 
peak to peak or from upeross to upcross vary in the characteristic way. It appears from 
experiment (but has not, I think, been shown theoretically) that the distribution of dis- 
tances from peak to peak is of the unimodal type with a central value somewhere near 
the mean distance between peaks; and similarly for troughs and upcrosses. In speaking 
of the “period” of an autoregressive series we mean the central value of one of these 
distributions. The question we have now to consider is whether this period is the same 
as the autoregressive period 2z/0 of the regression equation. 


30.28. We have seen in 29.26 that the mean distance between uperosses of the 
series generated by the moving average whose weights are £, . . . £, is given by 22/4, 
Say, where 


j=l 
Substituting for € from (30.22) and using (30.25), we find 
2p cos 6 cos 6 (1 — p?) 
cos ġ = 1^ —a?|l— p? 1 — 2p?cos 26 Tp 


2 1 1 — p? cos 20 
4b —a*\|1—p? 1 — 2p? cos 20 +p 


. 


MED . . 


155057 (80.91) 
Thus the mean period as defined by uperosses is 


—a 
27 /are cos (s m 2) : E z ; . (30.32) 


whereas that for the autoregressive period of the equation is 


2a/are cos (7). aa REL: 7 (30:38) 
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30.29. The mean period between upcrosses is thus not the same as the autoregressive . 
period. The two are very close for many of the values of a and b arising in practice. For 
instance, when b = 1 they are identical; when a = 1, b = 0:5 their ratio is 1-07. One 
might infer that an estimate of the period of an autoregressive scheme can be obtained 
from the correlogram, but this generalisation requires some important qualifications. 

(a) Firstly, the ratio of (30.33) to (30.32) is not necessarily close to unity for values 
of b in the neighbourhood of a?/4, i.e. when 0 is small and the autoregressive period is long, 
Consider, for instance, the series generated by 


Urgo = L2uj,, — 04, + Erpa 


We have 
a 1:2 
cos 9 = — 26 ~ B04 — 0-9499 
9 = 18-2", period = 19-7 units. 
However, for 4, 
cos d = 12 — 08571 
4 
$ —3r, period — 11-6 units. 


The mean distance between upcrosses, and a fortiori that between peaks, is very much 
shorter than the autoregressive period. 

(b) The mean distance between upcrosses may miss certain oscillations above or 
below the a-axis, so that it overestimates the period between peaks or troughs. On the 
other hand, the latter may include ripples on the main wave which we wish to ignore. 
The reader can verify for himself, by constructing an autoregressive series by some such 
formula as the above, how difficult it is to draw the line in particular cases. The difficulty, 
however, must be faced, for it is precisely the kind which we meet in dealing with observed 
series. 

(c) Owing to the appearance of the phase angle y in equation (30.29) the starting- 
point of the correlogram ( = 0) is not to be regarded as a maximum. The period of the 
correlogram is therefore to be calculated either by ignoring this point or by reference to 
distances between troughs and uperosses in the correlogram. 


30.30. The equation 
Uya + Uds + buy = Etta 
may be regarded as expressing the regression of tų}, on «,,, and tų, the term e,o being 
a residual error. We may therefore estimate the constants a and b from the regression 
equation of the observed series in the usual way. If we assume that the series is long enough 
for end effects to be negligible in determining the variances of the finite series, then 
var Upo = Var Uppy = var Up and from the usual formulae for regressions we find 


ry (1 — 72) 
ee G : S , E . (30.34 
č 1—ri ( ) 
LU YE d 2 1 1—r, 30. 
b= lc + lcg . (30.35) 


'This gives us the constants of the autoregressive scheme from the serial correlations. 
It should, however, be realised that these estimates are rather sensitive to superposed 
error of the type we refer to below (30.32), and it is therefore unsafe to estimate the 
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autoregressive period from them. The correlogram itself appears to be a safer guide on 
this matter. 


Example 30.6 

Consider again the sheep data of Table 30.7 and Fig. 30.6. Suppose we have decided, 
from the appearance of the correlogram, to attempt to represent the series by an auto- 
regressive scheme. 

In the first place, we have to inquire whether a scheme of the simple linear form (30.3) 
is likely to be adequate. Would it, for example, be better to consider the more general form 


Miss + teg + buys) + cui = Erts ` 
or need we take into account curvilinear regressions such as 
2 
Urea + Ug, + a Ups + Du, + D up + eus t 


The first point can be elucidated by the use of partial and multiple correlations. The 
following are the partial coefficients and the function of the multiple correlation 1 — R? 
as determined by the continued product of (1 — r?) (cf. vol. I, equation 15.45, 
p. 380) :— 


Order of Partial Value of Partial H-r) 

Correlation. Correlation. ys 
12 0-595 0-6460 
13.2 — 0-782 * 0:2509 
14.23 0-097 0:2485 
15.234 — 0:183 0:2402 
16.2345 0-031 0:2400 
17.23456 0-014 0:2400 


Evidently no appreciable gain in representation is to be obtained by taking the regression 
on more than the two preceding terms. 

The possibility as to better representation by taking curvilinear regressions may be 
considered by drawing the scatter diagrams of u, on tų 41 and wu, on Wo. These are 
shown in Fig. 30.8. It seems clear that there is an essential scatter in the data which no 
ordinary polynomial can represent, and that curvilinear terms are unlikely to add anything 
material to the linear regressions. 

We conclude that if the data are of the autoregressive type it is unnecessary to con- 
sider any more elaborate scheme than the simple type 


Urso + Rute, + buy = ergo. 
For this series we have 


T, = 0:595, fa = — 0-151. 
Hence 
-a= t n) = 1-060 
l-r? 
—b= a O78 
D ` 


Fic. 30.8.—Scatter Diagrams of w on w+1 (top figure), and w on w+42 (bottom figure). 
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"The autoregression equation is 
Mía = 1-060 mi, — 0:782 wu, + E420 
For the autoregressive period we have 


1-060 43.99 
= — 0:800 0 = 53:2 
cos 8 = 570-782) i 
. 360 
and hence the period is imo 6-8 years. 


Now in the correlogram (Fig. 30.6) there are peaks at k = 7, 17 and 25, giving a period 
of about 9 years; and there are troughs at k = 3, 13, 21 and 28, giving a mean period 
of 8:3 years. The autoregressive period as estimated from the correlogram is then between 
8 and 9 years, whereas that given by the autoregression equation is 6-8 years, considerably 
shorter. 

Using the values of a and b found above, we have for the mean distance between 
upcrosses, 


cos $ —-— = 0:5948, — = 53:5", 


giving a mean distance practically equal to the autoregressive period as shown by the 
regression equation. 

Finally, looking to the original series, we see that there are nine major peaks, the 

58 
first in 1874 and the last in 1932, so that the mean distance between peaks is n = 7:25 
years; and nine uperosses, the first between 1872 and 1873 and the last between 1930 and 
. 58 

1931, so that the mean distance between upcrosses is Ye 7-25 years, the same as for peaks. 


The upcross at 1876-7, however, is due to a temporary fall below the zero line, and had it 
not occurred we should have found a mean distance of 8-3 years. 

We have therefore reached this position: the mean period in the series itself appears 
to be about 7:25 years ; that given by the regression constants is 6-8 years ; and that given 
by the correlogram is about 8-5 years. These figures are scarcely close enough for comfort, 
and further data would be required to arrive at a more accurate estimate of the mean 
period. Nevertheless, they illustrate very well the kind of divergence which appears to 
be more the rule than the exception in dealing with short series. We should expect the 
correlogram to give a higher value than the series itself, for there may appear peaks or 
uperosses in the latter which are purely temporary fluctuations due to the casual element. 
On the other hand, the regression constants appear to give consistently lower values for 
the autoregressive period than the correlogram, an effect found by Yule (19274) for sunspots, 
Wold (1938a) for cost-of-living indices, and Kendall (1944a) in series of agricultural prices, 
acreage and livestock populations. 


30.31. Let us examine more closely the effect referred to at the end of the previous 
example. Our autoregressive system is based on a random element &( Which is added to 
the term w. We can therefore regard the value at time £ + 2 as composed of two parts, 
a systematic element expressed by az,,, + bu, giving the effect of the past history of the 
system at times ¢ + 1 and f, together with a new random element peculiar to the moment. 
This latter is random in the sense that it is casual and unpredictable ; but once it has 
occurred it is incorporated into the motion of the system and exerts an influence on future 
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history. It is therefore quite unlike an error of observation or a sampling error which 
distorts the value of a particular member but does not affect the others. 

Now suppose that such an error of observation is present, and let us represent it by 
7. For long series this element will increase the variance of the observed values by var y, 
but if it is independent of the remaining constituents of the series it will not affect the 
covariances. Hence the serial correlations will all be reduced in a constant proportion c, 
except of course rẹ; and this, as we proceed to show, will affect the autoregressive period 
as derived from the regression constants, in general shortening the period quite considerably. 


30.32. Tfr, is reduced to cr, and r, to cra, the constants of the regression equations 
are, from (30.34) and (30.35), 
; 0r (1 — en) 


HOPES thee eee, cree Iden 


1—c?n 
er, — c? nd 
—b-— MÁY. ; T A . (30.37) 
1—c?r 
The estimated autoregressive period is then 6’, given by 
ALIM 
cos 0' = DT 


cer, (1 — er.) 
24/(. — c? rj) (c? rj — er) 
Differentiating the logarithm of this expression and putting c — 1, we find 
do’ 2r. 


ge ^em fee 1 
Ben l-n 1-H85Ó n- 


which reduces to Ee 3 ‘ 
tang = EO(e +b — a dns 
tan 0 P 3b(ü-5—aj I . (30.38) 


Now tan0 = Ja — 1) and the period P = 27/0. We then find 


dP _ _ P*a (1 + 0) (3b? + b — a?) (30.39) 
de} pn, 40 {(1 + 6)? — a®}4/(4b — a?) : 

This equation gives us an approximate idea of the change in the period P for small 
changes in c near c = 1. For instance, with a = — 1:5, b = 0-9 we find P = 9-7 units, 


and from (30.39), 
( DR ) css 
dc c=1 


Thus, if c = 0-9, i.e. the variance of 7 is about 10 per cent. of the total, the period will be 
reduced by about 1:65 years, a substantial amount. 


\- 30.33. It is thus possible that the observed discrepancies between the autoregressive 
periods as given by the regression constants and the correlogram may be due to superposed 
random fluctuation which is not incorporated into the autoregressive scheme. This is 
not the only possible explanation ; for instance, in particular cases the disturbance function 
e may not be random. The hypotheses to be considered in such a case, however, are so 
complex that it is difficult to pursue a quantitative investigation without a wealth of 
material; and this, unfortunately, is usually denied-to us, at least in economic work. 
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Meteorological data are more numerous, and we may hope that further light will be thrown 
on the autoregressive scheme by a re-examination of the material available in this field. 


730.34. Consider now the more extended autoregression equation 


Um F A Up F as Mym- H -o F amt = Eye - (30.40) 
The explicit solution cannot be given in the simple form available when m = 2. It has, 
in general, the solution 
w =A, d +Aroht+... AS +B, . : . (30.41) 
where x, . . . «,, are the roots of 
Ct tae dota... ta, = 0, . : . (30.42) 
and B is a particular integral involving the «’s. For the series to be oscillatory without 
increasing indefinitely no term such as x‘, where x is real and greater than unity, can appear. 
Assuming this to be so, and assuming further that the series was “ started up " some time 
before t = 0, we reduce the solution to the particular integral B. 


m 


Choose a particular value & of 2,4. vj, such that 
* -1 


o 
& Fa 5 = 
f:+&+a,6&=0}) . ó . (30.43) 


y 
To 


Sm—1 Fai Ema +... + 5, 1.69 = 0. 


This is always possible in general, for it imposes m conditions on the m constants A. Then 
it will be found on substitution that a particular integral B is given by 


Lp LO UR CM eS (50.44) 
j-0 


Our series may then be regarded as generated by a moving 
the weights being combinations of damped harmonie and 


a generalisation of (30.24). 
average of infinite extent, 
exponential terms. 


730.35. The correlogram of such a series may be determined by the following method, 


due to Walker (1931). Multiply (30.40) by uw, ,, and sum. We find 


Teim + Qi Teen + 2 Teimz E... pa, rg = E tim Men), (30.45) 
varu 


Now w_;, depends only on €_;, and terms with lower subscripts and hence is uncorrelated 


with e, for bz- — m. Thus we have 


Trim + item i +... + Om rg = 0, kl —m. 


2250/3040) 
If we multiply (30.40) by U1+k+m We find similarly 
Te Tk dr... +O reas = T ten sem) . + (30.47) 
varu 3 


but the expression on the right no longer vanishes. 


In fact w, +k+m contains the term 
Ekti Etm and hence 


var € 
Tg Haife H... +4 Tk. c k> 
m 
m ke k+1 varu’ Th. 


- (30.48) 
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From (30.46) it follows that the serial correlation rą will be given by 
Tp = E (A; of), , ; : : . (30.49) 
E 


where the «’s are the roots of (30.42) and the A's are constants to be determined from initial 
conditions. Thus the correlogram will be the sum of terms which either decay exponentially 
to zero (x real) or oscillate with a similar decay to zero (x complex). Walker (1931) has 
used this result in an inquiry into a series of atmospherie pressures. 


The Autocorrelation Function 

30.36. If we have a series u (f) defined at every point of time in some range — ^ 
to +h, we may define its variance as 

. l (^ 

zr ? (t dt. . . = 3 . A : 
a NC ) (30.50) 
on the assumption that the mean value is zero, which does not limit our generality. Sup- 
pose the series is reduced to standard measure by dividing throughout by the square root 
of this variance. Then an evident generalisation of the serial correlation is given by 


h 

= a SIGIONOSEIBHR og o me (EIEN) 
2h J -n 

We shall call this the autocorrelation function. We can likewise regard it as defined when 


h tends to infinity, provided that the limit on the right in (30.51) exists. It is to be noted 
that r (k) is in that case an even function of k. 


r (k) 


30.37. We shall also consider the function 


Rik) =| FENN t BIA ORNATUS t ELISE) 


—o 


when it exists. We have 


f R (k) e? dk =Í f eit? u (t) u (t + k) dt dk 
= f f etp +r) u (t + k) e?! u (t) dt dk. 


The simple substitution t + k = q reduces this to 
Í eiva u (q) A emini u (t) dt. 


Thus, if we write 
«(ip = f eim aydq: eas UR un (80:59) 


we have " 
R (k) e™ dk = a? (p) + B? (p). a ; , (30.54) 


=o 


Tt follows, as is otherwise evident from the fact that R (k) is an even function, that the 
imaginary part on the left of (30.54) vanishes, and we have 


oo 


—* 


R (k) cos kp dk = a? (p) + p? (p). 5 D a (30.55) 
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If, following the notation of characteristic functions, we write dj (p) for the integral on 
the left in (30.54) and 4, (p) for that on the right in (30.53), we have " 


3 $r (P) = | bu (P) . . -. + a (80.56) 
We may then put dy, (P) = Vore, ; : % . 3 (30.57) 
where u is an arbitrary real function. We shall then have 


u(t) = z |. by (p) e"? dp 


= zal vánexp(n —itpdp. —.  .  . (3058) 
2% ] -5 
Since v (!) must be real, the imaginary part vanishes and this is equivalent to 
Tas XN 
u (t) = ae V br cos (u — tp) dp, . 1 7 . (30.59) 


and u must be an odd function of p. The result is due to Wiener (1930). It shows that 
the autocorrelation function R does not uniquely determine u (t) because of the arbitrary 
function y. 


~ 30.38. Consider now the autocorrelation function r (k) as defined in (30.51). Let 
us regard the series as defined but equal to zero outside the range —h to +h. 
Then we have 1 


nep nont haa” w()w(t--X)dt— R (k), ^. (30.60) 


` Where R and r are zero outside the range — 2h to + 2h. The foregoing results then con- 
tinue to hold with some modifications concerning factors in 2. If we write— 


12 k lere 2! 
$. (p) Er (k) e» dk = zal. R (k) ew dk —. —. (30.61) 


h D 
and $u (p) = x oy e? dt = 3l we'd, -.  .  . (30,62) 


then corresponding to (30.56) we have 
grt peas eo er cete M (30.63) 


We may now let h tend to infinity and observe that the results continue to hold under 
certain general conditions, provided that the limits exist. 


Example 30.7 
Consider the series 


u (t) = A, sin (6 + œ) + As sin (Apt +o) LL. + A, sin (Am £ + om). 
For the variance we have 
lim. (its? ()diselim L(* y pasas 
ads — lim |, 2; (Aisin (t + aj) ) dt, 


since the cross-product terms will contribute onl a finite amount to th ence 
y o the integral and hen 


É 


> 
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ET 
ý =lim > 


=42 (43). 


f 42 [A} {1 — cos 2 (4 t + o) ) dt 
-h 


- Similarly for u (f) u (t + k) we have 


å lim 3. LE {Ay sin (jt + aj) ) LE {4 sin (4t + 2, E + a) ) ]dt 


h 
= lim 5; | 3 Z {43 [cos 4; k — cos (Aj (2t + E) + 2aj}] ) di 
-h 


= 42 Aj cos 4, k. 
ZX {Aj cos (4; k) } 
ZA; : 
The correlogram is the sum of a series of harmonies, like the original series, but the 
coefficients are different and the harmonies are all in phase. 


Thus r(k)— 


30.39. The idea underlying the autoregressive scheme of representing time-series 
may perhaps be best illustrated by an analogy. Imagine a motor-car proceeding along 
a horizontal road with an irregular surface. The car is fitted with springs which permit 
it to oscillate to some extent but are designed to damp out the oscillations as soon as the 
comfort of the passengers will permit. If the car strikes a bump or a pothole in the road 
the body will oscillate up and down for a time but will soon come to rest so far as vertical 
motion is concerned. If, however, it proceeds over a continual succession of bumps there 
will be continual oscillation of varying amplitude and distance between peaks. The oscilla- 


tions are continually renewed by disturbances, though the distribution of the latter along » . 


the road may be quite random. The regularity of the motion is determined by the internal 
structure of the car; but the ewistence of the motion is determined by external impulses. 


30.40. It appears to me very plausible to suppose that oscillations in time-series 
are generated in this way. One does not have to postulate some external rhythmic influence 
which keeps the oscillation going, or to suppose that the system will oscillate without 
damping once it has been set in motion. Nor is it necessary to assume that the majority 
of the deviations between theory and observation are due to “errors” which exert no 
effect on the subsequent movement of the system. The reader, however, will have to 
form his own opinion on this matter.* We now proceed to examine an alternative scheme 
of representation in which the series is represented as a sum of (undamped) cyclic terms. 


“Periodogram Analysis 


30.41. It is well known that under certain general conditions a function f (t) can be 
expanded in the Fourier series, valid in a certain range, 


zi 2t 3at 
f(t) =% LoT + d, COS à + a; cos h pd 


. m .- 2nt . dat 
+ bo + b, sin — + b, sin + b, sin Sd Se . (30.64) 

Ay Ay A 
* The scheme considered in this chapter may over-simplify natural conditions in that it assumes 
finite random disturbances at equidistant time-intervals. 1f the intervals are not equal, or if the dis- 
turbances are small and continually oceurring, the autoregressive scheme is only an approximation. 


Much remains to be done on this subject. 


wa 
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Functions which are not periodic can be expanded in this way; for instance, in the 
range 0 — z <m, 
x 


s T-- ‘ 
5 sing — sin 2 + isin 3e — P sin 4e + spe 


The function of course, repeats itself in the range z <x < 2m, and so on. 

As a representation of observed series the Fourier series is rather restricted in scope, 
since the period of every term is a multiple of the fundamental period 2,. A more general 
Scheme is provided by the series 


S(t) — a, + a, cos PP + arcos 2 n. 
Ay Ag 


2t 


:— 2nt B 
+ b, + b, sin — + b, sin que. 4 : . (30.65) 
A EN 
or the alternative form 
f — A, + Ay cos (7 + a) + A, cos (7 a a) "Hrs . (30.66) 
1 2 


Here the A's are not necessarily commensurable. The object of our analysis is first of all 
to find out what are the best values of the 's to select, and secondly to evaluate the other 
constants a and b, or A and «. 


30.42. Suppose we wish to test whether a time-series contains a harmonic term with 
period u. Consider the series 


2v ?nj 

A== Ug CORI — 7. x 8 : (30.67)* 
22 : i 
2X3 . 9j 

B==- J w sin — 3 A 4 d . (30.68) 
n zl d 4 

and write 
S? = A? + B? 


- iz {ue (58) L ; E . (30.69) 


Suppose that the series is in fact given by 
. 2yr) : 
u; = a sin A T, 1 . (30.70) 


where b; is a component which we will assume to contain no cyclical element, so that its 
correlation with the other component is zero, at least for long series. Then we have 


2a N3/ . 2mj 2n) 2« 9nj 
A--— (sin cos 97) 4 2 (4 cos =) 


* Some writers define these sums with j fr i i 
y J from 0 ton — 1. The signs of A and B may then differ 
from those given by (30.67) and (30.68), but the intensity and phase are unaffected. 4 
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and the second term may be neglected. Thus, writing 


we have 


2 
A= > (sin aj cos fj) 


— {sin (« — p) j + sin (« + f)j) 


_ @ f sin } (x—£) n sin 3 (x—8) (n+1) , sin } («--f) n sin 3 (x--£) (n--1) 
n { sin 4 (x—) t and (H9 }. . (30.71) 


For large n this remains small unless « approaches f (or — f, which is essentially the same 
situation), and in that case we have 


A~a sin $ (a — B) (n +1). 
Similarly, B ^ a cos $ (x — f) (n + 1), 


so that Eee Lee ERI ULM e E : , . . (30.72) 


Thus S remains small unless the “ trial " period u approaches the real period A, and in that 
case equals the amplitude a. 


30.43. Similarly we may expect that if the series consists of a sum of harmonics 
with periods A, A» . . . Am, S will be small, unless w is equal to one of these periods, in 
which case it is finite and equal to the amplitude of the term concerned. 

This result forms the basis of what is known as periodogram analysis. We select 
a number of trial periods for different values of and calculate S? for each of them. 9°, 
which is called the intensity, is then exhibited as a function of u, and graphed as ordinate 
against u as abscissa, The diagram obtained by joining the points, each to the next, is 
called the periodogram. Tf this figure has peaks at certain values 4, . . . Am and we are 
prepared to assume that these are not sampling accidents, the values are the appropriate 
periods of harmonie terms and the intensity S? provides the corresponding amplitudes. 
The quantities A and B of (30.67) and (30.68) are obtained incidentally and provide the 
phase angles « of (30.66). We shall illustrate the arithmetic processes below. 


30.44. Fig. 30.9 shows the periodogram of the wheat-price index data of Table 30.1. 
In order not to confuse the diagram for lower values of the trial period we have shown 
only the major fluctuations. The length of the series was about 300 years from 1545 to 
1844, earlier and later figures shown in Table 30.1 not having been taken into account. 
The primary data have been taken from Sir William Beveridge's classical paper (1922) and 
are shown in Table 30.9. For practical reasons which will emerge presently, certain trial 
taken not over exactly 300 years but over the number N of years shown in 


periods are at 1 she 
To reduce the figures to comparability, Beveridge therefore multiplied the 


the table. A: 
sum A? + B? by aR 


"(60g 9eiqr) xepug eonq-jveqA oSprieaeg oui Jo ureigoponeq—'6'0g 'orq 
*(sanah) pomag 
06 oF o£ oc 
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TABLE 30.9 


Periodogram Analysis of the Beveridge Wheat-Price Index Data of Table 30.1. 
(From J.R.S.S., 1922, 85, 412.) 
The first observation relates to 1545, except where A and B are given in heavy type. 


| I 
> Number Intensity ; Number Intensity 
REA. of Year| A. | B. | N(* en) Eos of Years) 4. | B. | N(4*- B?) 
N. 300 N. 300 $ 
2:000, | 300 | FOI — 0-01 2-667 | 312 | —0-92) + 1-20) 2:38 
2.049 | 336 | — 0-40| — 0-09 0-19 2-087 | 301 | 4+ 1-23) — 0-02| 1:52 
2.054 | 304 |--048| — 0-72 0:77 2:092 | 315 | —0:04| + 0:23 0-06 
2-061 | 340 | + 0-38| — 0-57 0-54 2-706 | 322 | —0-27| + 1-33 1-97 
2-069 | 300 | +0-25| + 0-63 0-46 2-714 | 304 | +0-83] 4+ 1-17 2-10 
2-074 | 336 | — 0-61) + 0:51 0-71 2-727 | 300 | + 0-86] + 1-46 2:87 
2-080 | 312 | + 0-92] — 0-50 1-14 2-733 | 987 Bn 6:16 
2-087 | 288 | —0-52| —0-11 0-27 2:735 | 979 7:82 
2-095 | 308 | —0-91| + 0-90 1:69 2-737 | 312 6-22 
2-105 | 320 | +0-90| + 0-07 0-86 2-741 | 296 5:86 
2-112 | 288 | +0-90| + 0-80 1:38 308 1:55 
2-133 | 320 | + 0-89| + 0-15 0-84 . 348 | — 0-57| — 0-04 0-37 
23154 | 308 | + 0-48] + 0-23 0-29 2-769 | 324 | 4+ 1-49| + 0-23 2.28 
2.182 | 288 | + 1-32] — 0-59 1:99 2-778 | 325 | + 1-20] — 0-92 248^ 
2-200 | 308 | —0-13| — 0-60 0-39 2-800 | 336 | —1-01| — 0:19 1-18 
2-222 | 320 | —0-32| — 0-62 0-52 2-818 | 310 | + 0-55] + 1-07 149 
2261 | 312 | + 0-50] — 0-22 0-31 2:833 | 323 | + 0-78] — 0-10 0-67 
2-286 | 320 | — 0-38] — 0-85 0-93 2:846 | 296 | + 0-41] + 0-42 0-34 
2-316 | 308 | + 1-39] — 1-05 3-11 2-857 | 320 | + 0-96] + 0-21 1:08 
2-333 | 308 | —0-10| — 0:25 0-08 2:875 | 322 | 4-0-35| + 0-14 0-15 
2.353 | 320 | + 0-90] + 0:07 0-86 2-888 | 312 | +1-51] +0-26 2-43 
2.364 | 312 | — 0-12] — 0-63 0-43 2-895 | 330 | — 0-69) — 1-57 3-21 
2-970 | 320 | + 0-05} — 0-28 0-08 2-900 | 320 | +070) — 1-11 1-84 
2:375 | 304 | + 0-29] — 0-43 0-27 2:933 | 308 | — 0-04] + 0:39 0-16 
2-381 | 300 | — 0-19] — 1-22 1:53 247 | 336 | — 0-93) — 1-19 2:57 
2-385 | 310 | — 1-00] — 0-89 1:86 2-960 | 296 | —0-00| — 1-15 1:30 
2:391 | 330 | — 1-30] — 0-54 2-18 3-000 | 300 | —0-29| — 0-39 0-23 
2-395 | 309 | —0-72| + 0-60 0-90 3-040 | 304 | + 0-09) + 0-75] 0-58 
2-400 | 312 | + 0-34| + 0-68 0-60 3:077 | 320 | +0-05| + I-18| 1-50 
2-412 | 328 | —0:08| — 0-65 0-47 3411 | 336 | + 0:91 — 0-44) 115 
2:417 | 348 | + 0-63] + 0-57 0-69 3:143 | 308 | + 2-01) + 0-23) 4-20 
2-435 | 336 | +0-44| + 0-01 0-22 3-167 | 304 | +0-46) — 1-05, 1-33 
2-452 | 304 | — 1-40] — 0-51 2.23 3.200 | 320 | +0-43| + 0-95 oe 
2-402 | 320 | — 0-25] + 1-49 2-44 3-217 | 296 | + 1-25] + 0-00 1:55 
2-476 | 312 | —0-38| + 0:35 0-27 3-250 | 312 | —1-22| — 0-47 1-80 
2483 | 288 | —0-07| + 0:74 0-53 3-273 | 324 | —0-55| + 1-18 1:82 
2-500 | 320 | — 0-24] + 1-19 1:56 3-286 | 322 | —0-11| +0-99), 1-07 
2:512 | 324 | + 0-86) + 0-39 0-97 3:304 | 304 |4-0:13| + 0-75 0-59 
2-516 | 312 | + 0-45| + 0-24 0-26 3-333 | 320 | + 0-90] + 1:58 3-54 
2-529 | 301 | — 0-19) — 0-31) 0-13 3:364 | 296 | +1-76| + 0-98 4-00 
2-545 | 336 | — 1339 — 0-81 2-89 3375 | 324 | 4-0:56| + 0-92 1:24 
2-555 | 322 | + 0-38] + 0-50 0-42 3-385 | 308 | +0-35) + 1-03 1-21 
2-571 | 306 | + 1-25| + 0-55 1-91 3400 | 393 | 4112| + 2-37 741 
2-588 | 308 | + 0-30) + 0-43 0-28 3407 | 276 | +2-98) + 2-81 14-90 
2-000 | .312 | + 1-02] — 0-39 1-25 3412 | 348 | 41-27 — 398 15:53 
2-818 | 306 | —0-75| — 0-24 0-63 3-417 | 328 |4308| —2:24| 15-84 
2.625 | 9294 | —0-45|+1-36] ` 2-01 3:429 | 288 |--311| — 140| 1146 
2-643 | 296 | + 0-95) — 0-62 1-27 3-444 | 310 | +0-09) — 0-99 1:08 
: zx 
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TABLE 30.9—continued. 

A T Intensit; E Number Intensity 
Period | Vere| 4. | m, | N(B] Period oryems| 4. | B. |_N (4? +B) 
(Years). N. ETE ELT EN (Years) N, 300 

3-455 | 304 |4-0:55|--0:29| 0:39 4-933 | 296 | +1-57| + 1:58 4-91 
3-462 | 315 | +1:57|+ 102 4-87 5-000 | 300 | +1-85| + 1-00 4-30 
3-500 | 308 |--120| — 0-94 2:38 5-067 | 304 | — 0-05] + 3:98 16-09 
3.524 | 296 | +1-41] —1-18 3-31 5091 | 336 |— 0:73| + 5:55) 3505 
3-538 | 322 |+0-50|— 1-45 2:53 5-100 | 306 |+ 5:71|+ 2-98 49-94 
3:556 | 320 |+0-02| — 0-43 0-20 5111 | 322 | + 5-70| + 0-29 
3-571 | 325 | + 0-80] — 0-69 1-21 51125 | 328 | + 3-97] + 2:90 
3-000 | 324 | —1-03| + 0-82 1:88 5-143 | 324 | + 246| + 2-46 
3-619 | 304 | +1-18| + 1-23 2-94 5-200 | 312 | +0-02| + 0:30 
3-636 | 320 | +1-14| + 0-13 1:39 53250 | 294 | + 1-74] + 192 
3:043 | 306 | —0-16| + 0-27 0-10 5333 | 320 | +0-71| — 4-46 
3-067 | 308 | — 2-14] — 1-07 5:87 5400 | 324 |4- L04| 4+ 3-71 
3-070 | 309 | + 0-34] — 1-90 3-83 5415 | 325 | + 427 + 1:90 
3:692 | 288 | + 1:28] — 0-22 1-63 5-429 | 304 | + 4-72] — 0-28 
3-700 | 296 | + 0-90| — 0-59 1-18 5455 | 300 | +1-37| — 3-73 
3-714 | 312 | + 1-15] + 1-78 4-65 5:500 | 308 | — L04| + 1-49 
3-727 | 287 |—045| — 1-65 2-72 5-555 | 300 | + 2-40] — 0-68 
3550 | 315 | + 0-64| — 0.06 0-44 5-000 | 336 |4-0:46 + 1-21 
3-778 | 306 | —1-17| — 0:68 1:86 5:667 | 306 | +5-31| — 1-97 
3:800 | 304 | + 1-60] + 0-80 3:24 5-692 | 296 |--205| — 3-91 
3:833 | 322 | —112| — 163 417 5-714 | 320 | + 0-35] — 2-13 
3-857 | 324 | + 1-63| + 0-45 3-08 5-750 | 322 | + 1-39] — 0-33 
3:888 | 280 | —0-15| + 0-66 0-43 5:800 | 290 | + 3-55] — 275 
3:895 | 296 | —0-66) + 1-00 1-42 5:846 | 304 | + 0-00] — 2:29 
3:928 | 306 | + 0-64) — 1-61 3-06 5-933 | 356 | + 4-37] + 0-91 
3-962 | 309 | —0-67| + 1-74 3:59 6-000 | 300 |—3:50| — 0-12 
4000 | 300 |-147| — 113 3:64 6111 | 330 | — 0-79] — 1-90 
4077 | 318 | +0-57| — 0-26 0-41 6143 | 301 | + 0-74] — 2-96 
4111 | 296 | +1-13) — 1-70 413 6-167 | 296 | —0-22| — 2-94 
4149 | 290 | — 0-50] + 0-23 0-30 6-200 | 310 | — 2-02] — 3:38 
4107 | 325 | + 1-21] + 0-32 1-70 6-250 | 325 | — 3-23] — 0-11 
4173 | 322 -| +0-66| — 1-46 271 6-286 | 308 |— 172| — 0-59 
4900 | 294 | —0-99| — 0-41 1-02 6333 | 304 | — 1-52] + 1:29 
4-250 | 323 | + 0-50] — 2-73 8:32 6-400 | 320 | + 0-80] + 2-74 
4-286 | 300 | —0-65| + 0-79 1-04 6-500 | 312 | + 0-69] — 0-73 
4333 | 312 | — 1-50] — 1-30 410 6.571 | 322 |--149| — 0-77 
4:353 | 296 | — 2-85] — 0-24 8:05 6-007 | 320 |--025| + 0-21 
4304 | 288 | —2:98| + 0-75 9-07 6-727 | 296 | +0-08] — 0-13 
e4375 | 316 | — 2-47] + 0-87 719 6:750 | 324 | — 0:20| — 1-66 
4385 | 342 | — 0:50| + 2:55 7-72 6:800 | 306 | + 0-23] — 0-65 
4400 | 308 | — 138| + 3-27 12:89 6-909 | 304 | + 0-58] + 2:56 
4412 | 300 | + 0-08| + 3-62 13-11 6-933 | 312 | +1-68| + 2-01 
4417 | 318. | + 0-87] + 3-85 16-48 7-000 | 308 | + 3-10) — 2-17 
4-429 | 310 | + 1-80] + 2-41 9-32 7143 | 300 | + 1-83] — 1-86 
4-444 | 320 | + 2-15] + 0-83 5:66 7-200 | 324 | + 0-54] — 3-93 
4471 | 304 | +0-91| + 0-79 1-48 7-333 | 308 | + 1-52] — 2-81 
4-500 | 306 | + 1-87| + 0-72 4-09 7-400 | 296 | — 2-33) — 2-72 
4571 | 320 | — 0-21] + 0-04 0:22 7-417 | 356 | + 1-50) — 4-01 
4600 | 322 | — 0-08] + 1-24 1:65 7-429 ^ 312 |—380| — 1-49 
4-667 | 336 | + 0-19] + 0:93 1-00 7-500 | 315 | + 0-17] + 1-50 
4750 | 304 | — 0-12] + 2-98 5:28 7-600 | 304 | — 2-33] — 1-37 
4-800 | 288 |+ 9-44] + 1-08 6:84 7-667 | 322 |— 146| — 2-61 
4:857 | 308 |— L06| — 1-30 2-89 7-750 | 310 | +1-38| — 0:39 
4888 | 312 | —1-80| + 2-11 8-00 7-857 | 330 | — 0-50) + 0-28' 
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TABLE 30.9—continued. 


. Number Intensity & Number Intensit; 
etum ORYcuri Nt B. | N(A4*4 B3) (ons) of Years| A. B. N (42 4 B3). 
T ON Se — I.E TY 779002. 2^ 
8-000 312 — 39-96 | + 1:34} 18-67 17-500 280 — 6:18| — 4-45 54:12 
8-091 356 + 4-32, — 0-98 23-23 18-000 306 — 4:40| + 1:25 21-29 
8-200 287 | + 1-62| — 0-64 2:90 18-500 296 — 1-46) + 2:25 7:10 
8-222 296 | + 0-19] — 0-56 0-34 19-000 304 + 1-00) — 0-23 1:07 
8:333 325 + 0-21| + 0-91 0-95 19-750 316 — 4:73| — 1-59 26:25 
8:500. 323 | + 0-17| + 3-19 10-41 20-000} 320 | —5-71) + 1-69 37:88 
8:667 312 + 2-01| — 1-01 7:59 21-000 294 + 0-78] + 2-61 7:28 
8-800 308 + 2-97) + 0-83 9-77 22-000 308 + 1-87) + 1:58 6-18 
9-000 306 = 1-51) — 0:57 2-65 23-000 322 — 2-45) — 1-43 8-61 
9-200 322 — 0:16) — 1-56 2-65 24-000. 288 + 0:45) + 5-19 26:10 
9-333 336 | — 0-74 + 0-64 1-08 24-667 296 | + 4-31] + 1-99 22-21 
9-500 304 --1:08| + 1-07 2-26 25-000 325 + 3:86| — 0-19 14-94 
9-667 290 + 5:03, + 0:37 24-55 26-000 312 + 1-23] — 1-34 3:43 
9-750 312 + 446| — 3-56 33:89 27-000 324 + 0:50| — 0-33 0-38 
9-818 324 | + 1-21] — 4-94 27-90 28-000} 308 | —0-49| + 0-68 0-72 
10-000 320 — 1-19} — 0-83 2-25 29-000 290 T 1:08| — 2:12 5:46 
10-200 306 + 0-86 | — 0:22 0:80 30-000 300 — 1-53) — 2:34 7-81 
10-250 328 — 0:69) + 1-10) 1-84 31-000 310 — 1-98] + 0:13 4-06 
10-400 312 + 1:88} — 1:65 6:52 32-000 320 — 0:37] + 0-51 0:42 
10:500 294 + 2:46| — 1-82 9-19 33-000 330 + 0:96) — 0-78 1-68 
10-750 301 + 1-47) — 3-13 11-98 34-000 306 — 3-00 | — 2-15 13-90 
10-800 324 + 1:00| — 4-75 25:48 35-000 280 — 4-64 | + 1-79 23-11 
11-000 308 — 9:85| — 4-26 33-84 36-000 288 — 1-65| + 4-85 23:29 
11-200} 336 | — 2-48) + 0-55 7:24 37-000 | 296 | 2-08| + 3-92 19-47 
11-500} 322 | —1-32| — 0-66 2-34 38-000} 304 | + 2-99) + 0:56 9:37 
11-667 280 + 0:46 | + 1-42 2-07 40-000 320 — 1:44| — 0-63 2-63 
12-000| 312 | —2-47|} — 4-04 23:30 41:000 | 328 | — 1-93] + 0-93 5-01 
e | 12:143 340 — 0:22| — 4-37 21-66 42-000 294 + 0-93] + 3:02 9:75 
12:333 296 — 2:44 | + 2-74 11-43 44-000 308 + 3-00) — 0-14 9:27 
12-500 325 — 1-22) + 2:63 9-13 45-000 315 + 1-69] — 1-99 7:14 
12-667 304 + 2:28| + 5-19 32-58 46-000 322 + 0-16] — 2-27 5:58 
12-800 320 + 5-70, + 3:26 46-01 48-000 288 — 0:76) — 0:09 0:56 
12-875 309 + 6:46 | + 0-77 43-58 50-000. 300 + 1:83] + 2-19 8:14 
13-000 312 + 4:26) — 4-32 38-23 52-000. 312 T 477| — 0:57 24-03 
13-333 320 + 0-40) + 0:37 0:32 53:000 318 + 4-22) — 2-60 26:08 
13-500 324 + 2-56) — 2-09 11-79 54-000 324 + 2-84| — 4-01 26-09 
13-667 328 + 3-49) — 1-34 15-28 55-000 330 + 3-54] — 3:30 25:82 
14-000 308 + 1:15} — 1-00 2-38 56-000. 336 + 3-31) — 2-36 18-47 
14-500 290 — 378); — 0-18 13-82 58-000 290 + 3-89} + 1-49 16-82 
14-667 308 — 1:50] + 4-23 20-69 60-000 300 — 3-08) — 0:93 10-32 
15-000 300 + 6-32) — 2-66 46-83 62-000 310 — 1:62| + 0:39 2-88 
15:200 304 + 1:19] — 8:52 75-04 64-000 320 — 0-78] + 0-13 0-66 
15:250 305 — 0:28| — 8-65 76:17 66-000. 330 — 0:56 | — 0-56 0:69 
15-286 321 — 2:35] — 7-15 60-62 68-000 340 + 2-90) — 1-88) 13:58 
15:333 322 — 39:89 | — 6:55 62-29 70:000 280 — 0:69| — 0-16 0:47 
15-500 310 — 6-92) — 2-02 59-11 74-000 296 — 1-20| + 0:82 2-07 
16-000 320 — 1-46) + 4-52 24-02 76-000 304 — 0:66 | + 1-17 1:83 
16-667 300 + 6-21) — 0-39 27-33 78:000 312 + 0:58| + 1-26 2-00 
17-000 306 + 2:56| — 6:35 47-84 80-000 320 T 0:77 | + 0:82 1:34 
17:333 312 — 3-04| — 6-65 54-55 84-000. 336 + 0:26| + 0-69) 0-62 
L 
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" " 
An examination of the periodogram suggests the possibility of 20 periods, as follows :— 


Period Corrected Intensity Period Corrected Intensit y 
N (A? 4- B3). (Years) N (A? + BY). | 

(Years). mS . 300 

2:735 7-82 11-000 

3-417 15-84 12-000 

4417 16-48 12-800 

5:100. 42-34 15-250 

5:415 23-06 17:333 

5:667 32-72 20-000 

5.933 23-63 24-000 

TAIT 21-72 35-000 

8-091 23-23 54-000 

9-750 x 33-89 68-000 


Me od 


This is evidently rather an embarrassing profusion of possibilities, and we cannot 


- immediately accept all these periods as significant. Sir William discussed them in detail 


in the original paper and was inclined to attribute reality to 18 or 19 of them, partly on 
grounds which do not concern us here, such as the existence of weather oscillations with 
these "periods". In particular, where a period had a high intensity he analysed the 
two halves of the series separately to see whether the periods persisted, finding that most 
of them did. 

via 

30.45. An inspection of the correlogram of the series in Fig. 30.5 reveals a striking 
difference between the two methods of analysis. From the correlogram we should be 
inclined to suspect a mean period of about 15 years, corresponding to the peak of greatest 
intensity in the periodogram, with a subsidiary ripple of about 5 to 6 years’ period, corre- * 
sponding to one or more of the peaks in the periodogram ; but of the other 18 periods there 
is no sign. The conclusion is inevitable that either the correlogram is insensitive or the 
periodogram is misleading. Having raised this highly important question we shall, unfor- 
tunately, have to leave it unsettled in part ; but we shall show that at least three-quarters 
of the periods thrown up for consideration by the periodogram are not significant. 


-/30.46. The calculation of the intensity S? depends on that of the quantities A and B 
of equations (30.67) and (30.68). Suppose in the first place that our trial period uv is an 
integer. We then write down the series in rows of H, thus :— 


Uy Us Us op es Alyy 
Vui Uuta Un+3 se e Mey 
: : . o, pcr - (30.73) 
Vo-141 — "5-142 Upi) 43 AS TN SIE 
Totals m, Mg Ms m 


3 
We continue writing down the rows until there are fewer than 4 terms remaining, the 
extra terms being left out of account. The number pu is then as near in multiples of y 


a8 we can get to the number in the series ^, and may be denoted by N. Thi i - 
times known as the Buys-Ballot table. Td T MM 
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We then form the sum— 


È {m, cos 2m + m, cos zu F- . . + M, COS an A . (30.74) 
u u 7 u 
and this is clearly the quantity A of (30.67) for the series of N terms. Similarly we have 
u 5 
He (m sin ži), e E E) 
pH e u 


Tf the trial period y is a rational fraction Z we write the series down in rows of v and 


proceed in the same way ; and if it is irrational or is a number which gives a large value 
of v when expressed as a fraction, we take two convenient neighbouring values of u and 
interpolate in the periodogram. 


30.47. In actual practice we do not write down the array (30.73). The sums m 
may be formed on an adding machine by starting with u, and then adding every uth mem- 
ber to give m, ; then starting with u, and adding every uth member to give m;, and so on. 
Or alternatively, the values may be written on cards, one for each member of the series, 
and the pack dealt into u heaps. The total of the m’s, together with any members left 
over, equals the sum of the series and provides a check on the work. 


Example 30.8 

Consider the Beveridge series of Table 30.1. For the trial period 2 we may take 300 
terms of the series, and m, (about zero mean) will be the sum of the values 1, Ws . . . tig 
and m, will be the sum of the values with even subscripts. These sums are for the years 
1545 to 1844 inclusive, 


m, — 14,909 
m, = 14,893. 
The mean is 14,901, so that about the mean of the series 
m=+8 
M, = — 8. 


Now, for a trial period 2, sin a vanishes and hence B = 0. For A we have (in our nota- 


tion, which gives different signs from Beveridge’s to A and B)— 


2: 
A 2 {m cos z + ma COS a 


300 2 
2 
om {m, — m,} 
` p DL E, 
: 300 
300 
2 =< A? = 001 
Thus S? (corrected) 300 $ 


as shown in Table 30.9. 
13 NIE 
For a trial period 2-600, we could take u = m and arrange the series in rows of 13, 


requiring 23 rows accounting for 299 values of the series. We may, however, save our- 
selves some arithmetic by taking 24 rows, a multiple of 4, occupying 312 observations. 
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Or rather, we take 6 rows of 52, giving us the values for a trial period 52; then add m, 
to ma, M, to ms and so on, giving the result we would have got by taking 12 rows of 26 
and hence providing the values for a trial period of 26 ; then we add again in the same way, 
and so on, obtaining successively the values of m required for trial periods of 13, 6:5, and 
3-25. Similarly, by multiplying the original 52 values of m by the respective values of 


- 9 59 

cos oe and sin abe we get the values of A and B required for a trial period of 16 It is 

thus evident that we can use the single set of 52 values of m to provide the required constants 
52 52 52 


for trial periods PP and so forth. This is the main reason why, in Table 30.9, 312 


3r 
observations are shown as WN for the trial periods 2-080, 2-261, 2-364, 2-476, 2-600, 2:737, 
2.888, 3:250, 3-714, 4-333, 5-200, 6-500, 7-429, 8-667, 10-400, 13-000, 17-333, 26-000 and 
52-000. "The arithmetie, though difficult enough, is not as laborious as appears at first sight. 


~30.48. There is an interesting relation between the periodogram and the correlogram 
by which the latter, in theory, determines the former. We consider, as in 30.38, a function 
u (t) defined at every point of time in some range —h to h. Then 


^5 Teper, 
& (p) + ip (aif emu ae 


h , h 
= if cos pt u (t) dt +l sin ptw(t)dt.  . (30.76) 
TIE TA 


corresponds to the sums of (30.67) and (30.68) and may be written A + iB, where 
2x 
es 


Tt follows that the intensity S? is related to the Fourier transform of r (k) by the relation, 
derived from (30.63), 


X TEE 077) 


S? = 24, (p) 
9 2h 
Em (eae Ck, T cements (30.78) 
t Joh 


which is true also in the limit, subject to conditions of existence. Thus the intensity is, 
if r (k) exists over an infinite range, the quantity— 


2h. 
lim i r (k) cos kp dk, 
h —2h 


and if R (k) exists the parallel quantity— 


fÉ R (k) cos kp dk. 


The periodogram is thus derivable from the autocorrelation function. Since the latter 
does not uniquely determine the series the periodogram will not do so either, 


Example 30.9 
Consider the autocorrelation function, which in present notation may be written 


R (k) = p" sin (ko a y) 
sin y 


s 
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This, as we have seen, represents the correlogram of an autoregressive series of the simple 
linear kind involving w;,5, t41 and u, We may write this as 


Uh) = Cee) q>0 
sin y ^ 
since p is less than unity. It is to be remembered that since R (— k) = R (k), the modulus 
of k is to be used when & is negative. 
We have 
gi (^ ec sin (k0 + y) oos kp dk 
=o sin y p 


=| e~ !%! cos kô cos kp dk 


T q m q ; 
g+(6+p)? 9? + (8 —p)* 
This is the intensity in the periodogram of the series, p being the quantity ee and not to ~ 
be confused with our original damping factor p. i 
It is remarkable that, as u becomes large, S? tends to the constant value E 


that is to say, the periodogram tends to a fixed level, without peaks. From the analogy 
with the analysis of light-rays into colours (each colour corresponding to a particular har- 
monic), we may say that the periodogram develops a “continuous spectrum ". In a 
very interesting chapter on periodogram analysis Davis (1941) has given a number of 
examples exhibiting this kind of effect. 


Significance of a Periodogram 


30.49. Suppose that the values u, . . . U, are random elements from a normal 
population with variance o*. Then the function 


2x 2nj 
ás -Xu cos s 


j-l 
is normally distributed with variance 
E ] 
var A = E. cos? on 
Gal 
2 
= T: T Oath bat hw ELEM (BOT) 
and similarly 
2g? 
var B — m $ X 3 dj j . (30.80) 


We also see that cov (A, B) = 0 so that A and B are independent. Hence the joint 
distribution of A and B is 


n n i 
p = ef- galt? +B) ua dBJU a REED 
A.S,—VOL. II. FF 


434 TIME-SERIES 
Thus the distribution of S? = A? + B? is 


n n 
av = fep ( - s) as . Der e 60.82) 


2 
The probability that S? exceeds M in value is immediately obtainable as e-*. 


780.50. This result is due to Schuster (1898), but it gives only the probability that 
a value of S* chosen at fandom will exceed a given value; whereas in the periodogram 
we deliberately pick out the biggest values for inspection. Walker (1914) pointed out that 
if e~“ is small the probability that all of m independent values of S? should not exceed 
2 
M is (1 — e~")™, so the probability that at least one should exceed that amount is 
1— (1— e-*)n, QUEUE CANCER (80:83) 


Davis (1941) gives tables of this function. 


~ 30.51. Both the Schuster and the Walker tests depend on a knowledge of c?. Since 
2 
the mean value of S? in (30.82) is in the usual procedure is to consider the test as a com- 


parison of S? with E (S?); but c? itself has to be estimated from the original data. 


~30.52. Fisher (19294) has given a test which avoids the inexactitude due to the 

estimation of c*. Ifvis the estimate and S? is the largest intensity, then the probability that 
S2 

CE WM. TD MEME 


will exceed a given value is | 


» (1 — gy-1— (2e —9gy-1-E .. o4 (— 1m ipa (1 — mgy-1, (30.85) 
LI 

where v = 3 (n — 1), n being the (odd) number of observations, and m is the greatest 

integer less than 1/g. "The result was extended by Stevens (1939a)—see also Fisher (1940a) 

and Finney (1941a). Davis (1941) also gives tables of this function. 


30.53. All the tests we have described are based on random normal variation in the 
original series; but in practice nobody would embark on the labour of a periodogram 
analysis unless he had satisfied himself that the data were not random. It seems to me, 
therefore, that these tests are really off the main point, being tests based on a hypothesis 
which we have already rejected. "They are not without their usefulness, however. We 
may assume with some confidence that if a partieular intensity in the series is not shown 
_as significant on the hypothesis of random variation, it is not significant when the series 
is systematic. What does not follow is that if one intensity is significant then others must 
be so, even if they exceed the significance values; for they are not independent of the 
significant value, at least for short series. What we ought to do, perhaps, is to extract 
the component which is considered significant from the series and then analyse the 
remainder; and so on as long as significant terms appear. But this is h: ) 
computational possibility. Tests of significance in the periodogram, 
remain undiscovered. 


ardly a practical 
as in the correlogram, 
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Example 30.10 


Let us examine the significance of the 20 periods of the Beveridge periodogram given 
in 30.44, 
p 3m 4g? r : 
Sir William gave the value of E in his original paper as 5-898. Expressing the 
intensities as a multiple x of this amount, we find :— 


Period. | Ke Period. K. 
2:735 1:33 11-000 5-74 
3:417 2-69 12-000 3-95 
4-417 279 . 12-800 7:80 
5:100 7:18 15-250 12-917 
5:415 4-01 17-333 9-25 
5:667 5:55 20-000 6-42 
5:933 4-01 24-000 4-43 
TAM 3-68 35-000 3:95 
8-091 3-94 54-000 4:42 
9-750 5715 68-000 2-30 


There are 305 trial periods in Table 30.9. Let us consider the probability that at least 
one of 305 independent values of x will exceed given values, that is to say, the probabilities 
given by (30.83. We find— 

Probability. 
1:000 
0-996 
0-531 
0:097 
0-014 


A 


e o Oo £I 


1 


On this basis we should be inclined to attribute significance to the period 15-25, for which 
« = 12-91. We have no right to be surprised that at least one value exceeds x = 6. If 
we take this value as the critical one, only the periods 5-100, 12-800, 15-250, 17:333 and 
20-000 would be significant, that is to say, five out of 20. 

Again, since e^? = 0-007, we should expect to find in 305 independent members two 
in excess of 5. Actually there are eight. But they are not independent and we cannot 
rely on this comparison to say that six are significant. On the whole, however, it looks 
as if at least three-quarters of the periods are not significant, and possibly more. The 
example will illustrate the difficulty of testing the significance of the periodogram as a whole. 


Lag Correlation. 

30.54. The idea of serial correlation can be extended to the joint variation of two . 
series. If we have two series u (t), v (t) in standard measure, we may define the lag corre- 
lation of order & as 


r(k) = J u (t) v (t + k) dt, ^ : 3 . (30.86) 
where the integral includes summation in the case when the series are specified at equi- 
distant points of time. We note that in this case r (k) is not equal to r (— k) and r (0) 
is nU® unity. 
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Table 30.10 shows the lag correlations between two series of English wheat prices and 
horse populations (for the original series see Kendall, 19444). The data are shown as a lag 
correlogram in Fig. $0.10. 


TABLE 30.10 
Lag Correlations for Two Series of English Wheat Prices and Horse Populations (Deviations 
from a Simple Nine-Year Average). 


(The order of the correlation is the number of years by which horse population lags behind wheat price, 
e.g. rj is the correlation of wheat price with the horse population of ten years earlier.) 


Order of Order of 
Correlation The Correlation Tk» 
k. k. 
} 
— 10 — 0:22 1 | — 0:24 
— 9 — 0:19 2 — 0:36 
— 8 — 0:24 3 — 0:312 
=d — 0-16 4 0-16 
— 6 — 0:09 5 0-17 
- 5 0-07 6 0-39 
^ — 4 0:27 7 0:36 
— 3 0:31 8 0-15 
— 2 0-41 9 — 0-16 
— 1 0-25 10 — 0-44 
0 — 0:12 
. uE 


Fia. 30.10.—Lag Correlation of Wheat Prices and Horse Populations (Table 30.10). 


p 
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The systematic appearance is unmistakable and we notice in particular that the maximum 
correlation occurs between the wheat price and the horse population of two years later. 
This bears the obvious explanation that when a farmer earns more he buys or breeds more 
horses ; but it does not follow logically that this must be so or that there need be any 
causal nexus between the two series. If two autoregressive series are oscillating with 
mean periods which are close together and only a short span of experience is available for 
scrutiny, then lag correlations of the damped sinusoidal type may appear, as it were, by 
accident. 


30.55. We have now reached the end of our account of the statistical analysis of 
time-series and the end of this book ; and the final words we have to say of the one will 
apply generally to the other. Much has been left unsaid, partly from lack of space, partly 
from deficiencies in the present state of knowledge, and partly from a desire not to over- 
burden the reader. We have not avoided mathematical analysis where it was necessary 
to advance the argument; but we have insisted on the expression of results in numerical 
form and the necessity of experimental confirmation whenever it could be obtained. That 
there are gaps in the treatment we have given and unexplored branches of the subject 
to which we have barely referred are not entirely matters of regret; for the over-early 
and peremptory reduction of knowledge into arts and methods is one of the errors which 
Bacon cautioned us against more than 300 years ago. Much remains to be done; and this 
book will have served its purpose if the reader is left with the desire to do some of it himself. 


NOTES AND REFERENCES 


The theoretical aspects of the autoregressive series and of moving averages are dis- 
cussed in Wold's book on The Analysis of Stationary Time-Series (1938a). The basic 
memoir is that by Yule (1927a) on sunspots. For applications to meteorology see Walker 
(1931) and to economics Kendall (1944a). Davis's book on The Analysis of Economic Time 
Series (1941) contains a great deal of interesting material but should not be read uncritically. 
Two earlier papers by Yule (1921 and 1926) are also of interest. See also my paper on 
“The Analysis of Oscillatory Time-Series " in the Journal of the Royal Statistical Society 
for 1945, a paper by Yule in the same journal, my brochure (in press) on “ Researches in 
Oscillatory Time-Series ", and a symposium introduced by Bartlett in the Supplement to 
the Journal for 1946. 

The classical work on periodogram analysis is that of Schuster (1898). The books 
by Brunt (1931) on The Combination of Observations and by Whittaker and Robinson 
(1940) on The Calculus of Observations contain useful introductory accounts ; and Davis’s 
book referred to above has an excellent chapter illustrated with an unusual number of 
examples. Papers by Crum (1923) and Greenstein (1935) are of interest. The papers by 
Sir William Beveridge (1921, 1922) on wheat prices and rainfall have been justly described 
by Davis as a heroic piece of periodogram analysis. Tables facilitating the calculation 
of intensities were published by Turner (1913), and more complete tables will be given in 
my brochure referred to above. See also the book by Stumpff (1937). 

Various short-cut methods of periodogram analysis have been proposed by several 
authors, e.g. Oppenheim (1909), Bruns (1921) and Alter (1933, 1937); but their value is 
problematical. There is a useful memoir by Bartels (1935) which is worth studying. 
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EXERCISES 


30.1. For the autoregressive series 
Uya F Mugi + by = Ero 
show that if s is a random variable and the series is long, 
varu — 1 -+b 
vare — (1 — 5) {0 +6)? — aF 
* and hence that the variance of the generated series may be much greater than that of 
e itself. 


30.2. For the autoregressive series of the previous exercise use the relation 
Type + argus + br, = 0, k>-1 
to derive the relation 
pk sin (k0 + y) 
py ES ee ee 
sin y 


30.3. If the estimated coefficients a’ and b’ in the autoregressive scheme are reduced 
in the manner of 30.32 by a superposed error, show that 
b b 
TRUE ae 
a^ a 
(Yule, 1927a.) 


30.4. Show that if, in the autoregressive scheme of Exercise 30.1, b = 1, the series 
becomes undamped and the correlogram reduces to a simple harmonic. Examine the 
effect on the solution (30.23). 


30.5. If any series has fitted to it a series generated by the scheme of Exercise 30.1, 
a and b being any constants, show that for the serial correlations of the residuals, say cj, 
we have 
Q +a? + b?) py + a (Y + 5) (py + pr-1) +b (pies + Pr-2), 
1 +a? + b? + 2a (1 + 5) p, + 2bp, 


Ok 


30.6. Show that the series with an autocorrelation function 


does Ee 


has a periodogram which is zero for periods less than 4 and has ordinate H for periods greater 
than Y Le. has a continuous spectrum. 


30.7. In equation (30.71), noting that the dominant term vanishes for x«—p- EE 
n 


E 


where m is an integer, show that for such a “ vanishing " trial period y 


A -—À ( eu). approximately. 


= 


fk 
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2 
Hence the width of a peak in the periodogram is approximately E and the-main peak 
will be flanked by smaller peaks of the same width. (This *side-band ” effect is another 


complication in the interpretation of the periodogram, but not apparently a very serious 
one.) 


30.8. If a series of values 4, . . . U, is supplemented by a number of zeros as 
Uo, Uy, Ug + + + ig 1s 1545, ete., as far as is necessary, and the resulting series differenced, 
show that r 
2j 2j 2j 
=P =2 =.’ — 1y 
AEE 27) Ea Sd AA 


n—j 
where 7; is the sum of squares of jth differences and P; = > % 2,4; Hence show that 


kal - 
the arithmetic of serial correlation may be related to that of the variate-difference method, 
and vice-versa. 


30.9. Show that the serial correlations of a long series obtained by differencing a 
random series m times are given by 
—1)...(m—k+1) 
r(À) = (— 1p e» 
UE ) (m +1)... (m +k) 
and hence that the correlogram of such a series oscillates. 


(Yule, 1921.) 


30.10. The Whittaker periodogram. Writing 


where var u is the variance of the series and var m is the variance of the sums m of (30.73), 
show that if 
Uy = asin 2 + by, 
where b, is uncorrelated with periodic terms, then 
a? u? sin? wes 
BN? aint 4- 5" var b 


2 = 
7 (x) la? + varb 
Hence show that, in the neighbourhood of 4, the graph of 7 as ordinate with x as abscissa 
2/2 


(Whittaker's periodogram) has a peak of breadth N flanked by smaller peaks. 


(Whittaker, Month. Notes R. Astr. Soc., 1911, 71; cf. Whittaker and Robinson, Calculus. of. 
Observations.) 


APPENDIX A 
ADDENDA TO VOLUME I 


(1) Frequency and Distribution Functions 

An interesting paper by Burr (1942) considers the possibility of fitting elementary 
mathematical functions, not to the frequency function as has been the almost universal 
practice hitherto, but direct to the distribution function. This approach seems to merit 
further attention. In general, the distribution function has fewer analytical peculiarities 
than the frequency function—for instance, it cannot be infinite—and in applications to 
sampling it is the former which is nearly always required. The frequency function can, 
of course, be derived from the distribution function to a close approximation by differ- 
encing, or differentiation, processes which are usually easier to carry out than the inverse 
processes of integration. 


(2) Hatension of the Carleman Criterion (4.22) 
Cramér and Wold (1936) have extended Carleman’s criterion for uniqueness in the 
problem of moments in the following form :— ` 
If 
: Îi = Hio... + Moin... + Hoos ee 
the distribution is completely determined by its moments if 


1 
= I 
{ AZ j 


diverges. It israther interesting that the criterion is independent of the product-moments. 


(3) Convergence of Series Leading to Standard Errors 


The usual type of expansion in differentials, exemplified in 9.6, raises a point of mathe- 
matical difficulty in that the differentials themselves and the remainder terms, though 
usually small, may sometimes be large for sampling reasons, however large the sample. 
The necessary rigorisation of the process has been given by Derkson (1939) in terms. of the 
notion of stochastic convergence, that is to say, a sort of statistical convergence in which 
the series converges nearly always in a precisely defined sense. 


(4) Moments of Moments for Finite Populations 


The formulae for moments of the mean and variance in samples from a finite population 
were stated without proof in 11.26. It is obvious that if in these results we let N, the 
population number, tend to infinity, we obtain the formulae for sampling from an infinite 
population. Irwin and I (1944) have recently shown that the process may be reversed 
and the formulae for the finite case derived from those for the infinite case. This offers 
the simplest and most direct method of deriving the formulae known to me. Reference 
may also be made to Sukhatme, “ On Bipartitional Functions " (Phil. Trans., 1938, A 
237, 375) and “ Moments and Product-Moments of Moment-statistics for Samples of the 
Finite and Infinite Populations " (Sankhyā, 1944, 6, 363). 
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(5) Tied Ranks 

In the treatment of rank correlation in Chapter 16 it was assumed that ranking was 
always possible; but in practice cases occur when.two or more individuals “ tie " and the 
ranks have to be equalised in some way. This possibility introduces the most intractable 
complications into theoretical work, but sometimes ties occur so frequently that a systema- 
tic method of dealing with them is necessary. The subject has been reviewed and recon- 
sidered by Woodbury (1940) and more recently by myself (Biom., 1945, 33, part 3). 


(6) Coefficients of Rank Correlation 

Daniels (1944) has recently unified the theory of rank correlation by showing that 
Spearman’s p, my t and the product-moment coefficient are particular cases of a general 
coefficient. In particular he has demonstrated the formula for the covariance of p and t 
given in 16.24 as very probably true. 


APPENDIX B 
BIBLIOGRAPHY 


The following Bibliography has no pretensions to completeness in spite of its length. 
Tt contains about half the titles recorded in my own notes, which themselves are doubtless 
far from comprehensive. . Nevertheless, I hope it will be useful to those readers who want 
to take their studies of particular subjects somewhat further. By consulting the references 
given here and following up the references which they themselves provide, it should be 
possible for the reader to acquaint himself with most of what is known, or at least with 
what is worth knowing, about a particular topic. : 

"The names of authors are not included in the Index (pages 504 ff.) unless they occur 
in the text, since the Bibliography itself is arranged alphabetically under authors’ names. 
The subjects, however, are indexed, and anyone wishing to consult references on a par- 
ticular topic should refer in the first place to the Index, which in turn will refer to the 
authors who have dealt with the matter in question. 

In general the Bibliography contains only references to theoretical papers; applica- 
tions and illustrative material are included only when some theoretical point is involved. 
Papers which have been superseded by later work are omitted, except where they have 
a historical interest. 

In compiling this material I have been particularly indebted to the valuable periodical 
reviews of Recent Advances in Mathematical Statistics by Irwin, Hartley and others in 
the Journal of the Royal Statistical Society: 1932, 95, 498; 1934, 97, 114; 1935, 98, 
88; 1936, 99, 714; 1938, 101, 394; 1939, 102, 406; and 1940, 103, 534. 

Many papers written since 1939 are included, but some journals are not available in 
war-time so that foreign work published after the entry of various countries into the war 
may be incompletely represented. Where possible, the references have been checked 
against the original publications, but here also I have had to rely on second-hand references 
in cases where the original papers were inaccessible. 

Note.—Names beginning with de, del, le, St., van, von, etc., are entered under those 
titles, i.e. the order is strictly alphabetical. 
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Age and audible pitch, (Example 22.4) 152-3, 
(Example 22.5) 155-6. 

Agricultural statistics, bibliography of, Bibl., 
Wishart (1934a, b) 501. 

Aitken, A. C., minimum variance, 51, (Exercises 
18.1 and 18.2) 61; N.R., 61, 173. 

Allan, F. E., orthogonal polynomials, 161, (Exer- 
cise 22.4) 173; N.R., 173, 245. 

Almost periodic functions, Bibl.: Besicovitch 
(1932) 446, Bohr (1925) 447, Hartman 
and others (1938) 467, Kerchner and 
Wintner (1936) 473, van Kampen (1939a) 
496, Wintner (1934b) 500. 

Alter, D., N.R., 437. 

Amount of information, in estimation, 29-30. 

Analysis of variance, generally, 175-246; one- 
-way classifications, 175-6 ; two-way classi- 
fications, 181-7; three-way classifications, 
187-8; interactions, 188-9; n-way classi- 
fications, 189-98; arithmetic of, 198-9; 
2-test in, 199; factorial experiments, 199- 
202; in non-normal data, 205-16 ; variate 
transformations, 206-9; randomisation, 
209-13 ; randomised blocks, 213-14 ; rank- 
ing tests, 214-15; estimation of class- 
differences, 218-19; different numbers in 
sub-classes, 220-8 ; factorial classifications, 
228-9; missing plot technique, 229-33 ; 
relation with regression analysis, 233-7; 
covariance analysis, 237-45. 

Bibl.: Bartlett (1936d,e) 445; Beall 
(1942) 446; Bliss (1938) 447; Brandt 
(1933) 449; Clark and Leonard (1939) 452 ; 
Cochran (1935, 19370, 19396, 1940b) 452; 
Comrie and others (1937) 452; Curtiss 
(1943) 454; Daniels (19386) 455; Fieller 
(1940) 460; Hendricks (1935) 468; P. L. 
Hsu (1940, 19415) 469; Irwin (1931, 1934, 
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1942) 470; E. S. Pearson (19315) 482; 
Roy (19396, 1942a, b) 489; Schultz and 
Snedecor (1933) 490; Snedecor and Cox 
(1934a) 492 ; Snedecor (19345) 492; P. C. 
Tang (1938) 494; Wald (1940b) 497, 
(1941d) 498; Wilks (1932e, 1937b) 499, 
(1938e) 500; Yates (1938c) 502. 

See also Fisher's Distribution, Replica- 
tion, Blocks, Design, etc. 

Analysis situs, Bibl., Hotelling (1927) 469. 

Ancillary estimators, 32-3. 

Anderson, O., variate-difference method, 391, 393. 
N.R., 394. 

Andersson, W., N.R., 172; (Exercise 22.5) 174. 

André, D., N.R., 136. 

Animal experiments, Bibl., Wishart (1939) 501. 

Association, Bibl.: S. S. Bose and Mahalanobis 
(1938a) 448, M. Greenwood and Yule (1915) 
466, K. Pearson and Heron (1913c) 484, 
K. Pearson (1913d) 484, Yule (1900, 1912) 
502. 

Asymmetrical frequency-distributions, Bibl., Hans- 
mann (1934) 467. See also Gram-Charlier 
Series, Pearson Distributions. 

Asymptotie distributions, BibL, Hartman and 
others (1939) 468, Haviland (1939) 468. 
See also Convergence in Probability. 

Attributes, significance in k samples, 119-20. 

, sub-sampling for, Bibl., Bartlett (1937a) 445. 

Autocorrelation,'see Serial Correlation, Correlo- 

gram. 

function, 421-3. 

Autoregression equations, 399 ; (Table 30.4) 401; 
406-8; period of, 414-21. See also Serial 
Correlation, Correlogram. 

Average, accuracy of, Bibl.: Bowley (1912) 448, 
Keynes (1911) 473. See also Mean, Median, 
Mode. 


Balance, in design, 263-5. Bibl.: R. C. Bose 
(1939) 448, R. C. Bose and Nair (1939) 448, 
R. C. Bose (1942a) 448, Cox (1940) 453, 
K. R. Nair and Rao (1942) 479, Neyman and 
Pearson (1938d) 480, E. S. Pearson (19375, 
1938) 483, “Student” (1938) 493, Weiss 
and Cox (1939) 498, Yates (1938a, 1940) 
502. 

Barbacki, S., N.R., 266. 

"Barley yields, (Table 29.1, Figure 29.1) 304. 

Barnard, M. M., (Example 28.3) 345-8 ; N.R., 359. 
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Bartels, J., N.R., 437. 

Bartlett, M. S., distribution of 7, 103; conditional 
` tests, 127; k samples, 299, 323 ; stabilising 
variance, 207-8; Wishart’s distribution, 
333. Exercises from: (21.7) 139, (21.10) 
139, (21.11, 21.13, 21.14) 140, (27.2) 326, 
(28.2) 360, (28.12) 362. N.R., 45, 83, 94, 
136, 245, 304, 359, 437. 

Bayes' theorem and postulate, in estimation, 58-9 ; 
in relation to fiducial inference, 90-1, 93-4. 
Bibl.: Bayes (1763) 446, Berkson (1930) 
446, Burnside (1924) 450, Molina (1931) 
478, E. S. Pearson (1925) 482, K. Pearson 
(1920a) 485, von Mises (1938) 497, Wishart 
(1927) 500. 

Beall, G., N.R., 216. : 

Behrens’ test, 82, 91-4, 111-12. See Two Samples. 

Belonging coefficient, Bibl., Kullback (1935c) 474. 

Bessel function distribution, (Exercise 28,2) 359— 
60. Bibl.: R. C. Bose (1938a) 448, S. S. 
Bose (1938a) 448, Fieller (1932a) 460, 
McKay (1932) 477, K. Pearson (1933a) 486, 
K. Pearson and others (1932a) 486. 

Best critical regions, 272, 275-8. 

Beta (measure of skewness and kurtosis), Bibl., 
MeKay (1933) 477. 

Beta-function, Bibl., Müller (1931) 479, Thompson 
and others (1941a) 494. 

Beveridge, Sir William, (Table 30.1) 396, N.R., 
437. See Wheat-price Index. 

Bias, in estimation, 3-4; in statistical tests, 
307-27. Bibl.: Daly (1940) 454, Neyman 
and Pearson (1936, 1938) 480, Neyman 
(1935b) 480, Yates (1935a) 502. 

Bimodal distributions, transformations of, Bibl., 
Baker (1930a) 444. 

Binomial, confidence intervals for, (Example 19.2) 
66-9; tables of, 81. 

—, generally, Bibl.: Ayyangar (1934) 444, 
Camp (1924) 450, Clopper and Pearson 
(1934) 452, Cochran (1936, 1937a, 19406) 
452, Fisher (19410) 462, Greenwood and 
Yule (1920) 466, Kullback (19350) 474, 
Lurquin (1937) 476, K. Pearson (1915) 
484, Romanovsky (1923) 489. 

Biological assays, Bibl., Irwin (19375) 470. 

Births, proportion of males in, (Example 21.8) 120. 

Biserial coefficients, Bibl. : Newbold (1925) 479, 
K. Pearson (1909, 1910) 484, (1917) 485, 
Soper (1914) 492. 

Bishop, D. J, N-R., 304, 359. 

Bivariate surfaces, Bibl. : Narumi (1923a) 479, 
Nicholson (1943) 481, Pretorius (1930) 487, 
Rhodes (1923, 1925) 488, Ritchie-Scott 
(1921) 489, Villars and Anderson (1943) 
496. 

Blocks, randomised, 213-14. BibL: R. C. Bose. 
(1939) 448, R. C. Bose and Nair (1939) 448, 


R. C. Bose (1942a) 448, Cornish (1940a, b, c) 
453, Cox (1940) 453, Fisher (1940, 1942a) 
462, Goulden (1937) 465, Kishen (1942) 
. 473, Nair and Rao (1942) 479, Nair (1943) 

479, Savur (1939) 490, Yates (19365, 1939a, 
1940) 502. 

Bose, C., N.R., 266. 

Bose, R. C., N.R., 359. 

Bowley, A. L., N.R., 260. 

Brady, J., N.R., 245. 

Brandt, A. E., (Example 24.1) 221-5, N.R., 245. 

Breeds of pig, (Example 24.1) 221-5, (Example 
24.2) 225, (Example 24.3) 226-7, (Example 
24.4) 229, 

Brookner, R. J., N.R., 304. 

Brown, G. W., bias in tests, 323, N.R., 304. 

Brown-Spearman formula, Bibl, Wherry (1935) 
499. 

Bruns, H., N.R., 437. 

Brunt, D., rainfall data, (Table 29.4) 367, N.R., 
437. 

Burr, I. W., distribution functions, 440. 

Buys-Ballot table, 430. 


Calculating machines, Bibl.: Comrie (1936) 452, 
Hey (1938) 468, Mallock (1933) 477. 
Canonical correlations, 348-58. Bibl.: Bartlett 
(1941) 445, Hotelling (1936b) 469, P. L. 
Hsu (1941a) 469. See Multivariate Analy- 

sis. 

Carleman criterion, 440. 

Cauchy population, estimation of location, 2, 

"(Example 18.2) 51; median in, (Example 
17.4) 6; approximation to estimator for, 
(Example 17.11) 23-4 ; loss of information, 
(Example 17.16) 32. 

Cave, B. M., N.R., 394. we 

Cement, specification of, Bibl., Wilsdon (1934) 
500. 

Central confidence intervals, 66. 

—— limit theorem, Bibl.: Bernstein (1927, 
1936) 446, Bochner (1936) 447, Feller 
(19360, 1937) 460, Gnedenko (1938) 465, 
Liapounoff (1900, 1901) 476, Lindeberg 
(1922) 476, Madow (1939) 476, Pólya (1920) 
487. See Convergence in Probability. 

Centre of location, 41. 2 

Chains, in probability, sce Markoff Process. 

Characteristic equation, Bibl Horst (1935) 469, 
Samuelson (1942) 490. 

— functions, Bibl.: Boas and  Smithies 
(1937) 447, Duguó (1939) 458, Glivenko 
(1936) 465, Haviland (19345, 1935) 468, 
Kullback (1934, 19366) 474, Kunetz (1936) 
474, Wintner (1936) 500. 

Charlier’s series, see Gram-Charlier Series. 

Chi-squared (y?) minimum, 55-8; in testing 
goodness of fit, 106-7; in testing hypo- 
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theses, 299, 302; generalisation in multi- 
variate analysis, see Wishart’s Distribution. 

Chi-squared, generally, Bibl. : Aroian (1943) 444, 
Berkson (1938) 446, Brownlee (1924a) 449, 
Camp (19385) 450, Cochran (1936a, 1942a) 
452, Deming (1934, 1938) 456, Eisenhart 
(1938) 459, El Shanawany (1936) 459, Fisher 
(1922, 1928c, 1924d) 461, Fry (1938) 464, 
Grüneberg and Haldane (1937) 466, Gumbel 
(19430) 466, Haldane (1937, 1938, 1939, 
1940) 467, Hoel (1938) 468, Irwin (19295) 
470, Jeffreys (1938), 1939b) 471, Johnson 
and Welch (1939) 471, Koshal (1939) 474, 
Mann and Wald (1942) 477, Merrington 
(1941) 478, Neyman and Pearson (1931a) 
480, K. Pearson (1900c) 483, (19166, f, 
1922a, 1923) 485, (19325) 486, Robinson 
(1933) 489, Seal (1940) 490, K. Smith (1916) 
492, Snedecor and Irwin (1933) 492, Su- 
khatme (1937a, 19384) 494, C. M. Thompson 
(19415). 494, Wilson and Hilferty (1931a) 
500, Wilson and others (19315) 500, Yates 
(1934b) 502, Yule (1922) 503. 

Clitie curve, 142. 1 

Clopper, C. J., confidence limits for a binomial, 81. 

Closeness, in estimation, Bibl., Geary (1944) 464. 

Closure, Bibl, Stekloff (1014) 492. 

Cochran, W.-G., on Fisher's distribution, 117, 199 ; 
elimination of variates, 170, (Example 
22.10) 171; theorem on sum of squares, 
177-8; N.R., 136, 216. 

Cograduation, Bibl, Gini (1939) 465, Salvemini 
(1939) 490. 

Combination of tests, 132-3. Bibl. : David (1934) 
455, E. S. Pearson (1938) 483, K. Pearson 
(19335) 486, Wallis (1942) 498. 

— of observations, Bibl.: Bruen (1938) 449, 
Brunt (1931) 449, Mather (1935) 477. See 
Errors, general theory of. 

Compatible events, Bibl., Gumbel (19386) 466. 

Complete sufficiency, in estimation, 40. 

Complex experiments, Bibl., Yates (1935b) 502. 
See Design. 

Composite hypothesis, 269, 282-3, 287-92, 316-17. 

Compound frequency-distributions, BibL, Hel- 
guero (1900) 408, K. Pearson (1915) 484. 
See Bimodal. 

Concentration, Bibl.: Castellano (1933a, b, 1937) 
451, Galvani (1932).464, Gini (1932) 405, 
Pietra (1932a) 486, von Schelling (1934) 
497, Wold (1935) 501. 

Concordance, Bibl. Gini (1916) 465. 

Concordant samples, 128. 

Conditional statistics, (Exercise 21.10) 139; N.R., 

45. Bibl, Bartlett (1938b) 445. 

tests, 127-8, 134. | 

Confidence, belt, 63; coefficient, 63 ; intervals, 
62-84; for one parameter, 62-5 ; central 


and non-central, 66-9; for large samples, 
69-71; shortest sets, 71-4; sufficient 
estimators, 74-5; for several parameters, 
76-9, 81-2; studentisation in determining, 
79-81; tables of, 81; limits, 63. 

Bibl. : Clopper and Pearson (1934) 452, 
David (1937, 1938a) 455, Frankel and Kull- 
back (1940) 463, Kolmogoroff (1941) 474, 
K. R. Nair (19406) 479, Neyman (1937), 
1941a) 480, E. S. Pearson (1932) 482, 
Pearson afid Sukhatme (1935b) 482, Ricker 
(1937) 488, W. R. Thompson (1936) 494, 
Wald and Wolfowitz (19395) 497, (1941c) 
498, Wald (1942a) 498, Welch (1939a) 498, 
Wilks (1938b,c) 499, (1939a) 500, Wilks 
and Daly (19395) 500. 

Configuration of sample, 127. 

Confluence analysis, Bibl.: Cobb (1939) 452, 
Frisch (19346) 464, Mendershausen (1939) 
478, Reiersól (1940, 1941) 488. 

Conformity, index of, Bibl., Solomon (1939) 492. 

Confounding, 262-3. Bibl. : Barnard (1936) 444, 
R. C. Bose and Kishen (1941) 448, Fisher 
(1942c) 462, K. R. Nair (19385, 1941) 479, 
Yates (1933a) 501. See Design. 

Consistence, of estimators, 3, 12-15. 

Contagious distributions, Bibl., Feller (1943) 460, 
Neyman (1939a) 480. 

Contingency, Bibl. : Bartlett (1935b) 445, Blake- 
man and Pearson (1906) 447, Harris and 
Treloar (1927) 467, Hirschfeld (1935) 468, 
Kondo (1929) 474, K. Pearson and Blake- 
man (1906) 484, K. Pearson (1900a, b) 483, 
(1904) 484, (1916b) 485, Stevens (1938a) 
493, Weida (1934) 498, Wilks (1935a) 499, 
Yates (1934b) 502, Young and Pearson 
(1916) 502. 

Continuous spectrum, in periodogram, 433. 

Convergence in probability, Bibl. : Cantelli (1916, 
1917, 1923, 1933a, 1935) 450, Cramér (1934) 
454, Dodd (1926, 1927) 456, Doeblin (1938, 
1939) 457, Dugué (1937a) 458, Feller (1937) 
460, Fréchet (1930) 463, Jordan (1933) 472, 
Kolmogoroff (1937a) 473, Kozakiewicz 
(1937, 1938) 474, Lévy (1935b, 1936c, 1939a) 
475, Messina (1933) 478, Romanovsky 
(1932b) 489, Slutzky (1925, 1937a) 491. 
See also Central Limit Theorem. 

Convolutions, Bibl., van Kampen (1937a) 496, van 
Kampen and Wintner (1937b, c) 496. 

Cornish, E. A., on Fisher's distribution, 116, 
N.R., 136. 

Corrections, for grouping see Grouping Correc- 
tions ; to correlations, Bibl., Roff (1937) 489. 

Correlated observations, sampling from, Bibl. : 
A. T. Craig (1933b) 453, C. C. Craig (1931a) 
453, (1932) 454, Rhodes (1927) 488. See 
also "Time-series. 
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Correlation, confidence intervals for coefficient, 
81; Pitman's test for, 131-2 ; significance 
of, 235. 

Bibl.: Baker (19305) 444, Bilham (1926) 
447, Bispham (1920, 1923) 447, Bonferroni 
(1939) 447, Brander (1933) 449, W. Brown 
(1909) 449, Brownlee (1910, 1925) 449, 
Cheshire and others (1932) 451, Cochran 
(1937a) 452, Coleman (1932) 452, Cowles 
and Chapman (1935) 453, Day and Fisher 
(1937) 455, David (1937, 1938) 455, G. R. 
Davies (1930) 455, de Lury (1938) 450, 
Deming (1937) 456, Dieulefait (1934a, 
1935a) 456, S. C. Dodd (1937) 457, Dunlap 
(1931) 458, Eells (1929) 459, Ezekiel (1930a) 
459, Fischer (1933a, b) 460, Fisher (1915, 
1918, 1921c, 1924a) 461, Fréchet (1933) 
463, Frisch (1929) 463, Frisch and Mudgett 
(1931) 463, Garwood (1933) 464, Geary 
(1927) 464, Gehlke and Biehl (1934) 464, 
Geiringer (1933) 464, Jeffreys (1939c) 471, 
Khintehine (1928) 473, Kuzmin (1939) 474, 
Lindblad (1937) 476, Merzrath (1933) 478, 
A. N. K. Nair (1942) 479, Newbold (1925) 
479, E. S. Pearson (1923, 1924, 1931a, 1932) 
482, K. Pearson (18975, 1900a, b, 1902a) 
483, (1904, 1905, 1907a, 1909, 1910, 1913a, b, 
1914, 1921) 484, (1920b, 1925b) 485, Pitman 
(1939c) 486, Prokopovie (1935) 487, Quensel 
(1938) 487, Rider (1932) 488, Romanovsky 
(1925a) 489, Soper (1913, 1914, 1917) 492, 
Steffensen (1934) 492, Stouffer (1934, 
1936a, b) 493, “Student” (19085) 493, 
Thorndike (1937) 494, 'Thouless (1939) 495, 
Tschuprow (1925, 1928) 495, (1934) 496, 
Wicksell (1917a, b, 1921, 1933) 499, Yasu- 
kawa (1925) 501, Yule (1897a, b, 1906, 
1907, 1910) 502. 
See also Multiple Correlation, Regression. 

— ratio, Bibl.: Hotelling (1925) 469, Isserlis 
(1914, 1916) 470, Kelley (1935) 472, Mussel- 
man (1926) 479, E. S. Pearson (1927) 482, 
K. Pearson (1905, 1910, 1911a, b, 1915a) 
484, (1917, 19235) 485, “ Student ” (1913) 
493, Wallis (1939) 498, Wishart (1932a) 500. 

Correlogram, 404-12; significance of, 412-13; of 
general linear series, 420-1; relation with 
periodogram, 432-3. 

Cost of living, Bibl. : Bennett (1920) 446, Bowley 
(1919) 448, Konés (1939) 474. 

Cotton yarn, Bibl., Tippett (1935) 495. 

Counting experiments, Bibl., Peierls (1935) 486, 
Tippett (1932) 495. 

Coutts, J. R. H., data from, (Table 22.1) 150. 

Covariance, analysis of, 237-45. Bibl.: Bailey 
(1931) 444, Bartlett (1935d, 1936c) 445, 
Brady (1935) 449, Cochran (1934) 452, 
Cornish (1940c) 453, Cox and Snedecor 
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(1936) 453, Hirschfeld (1937) 468, K. R. 
Nair (1940a) 479, Snedecor (1935) 492, Wilks 
(1936) 499, (1938c) 500, Wishart (1936) 501. 

Covariance, distribution of, (Example 28.1) 334-5. 

Cramér, H., w?-test, 108-9; Carleman criterion, 
440. 

Critical region, 270, (Example 27.2) 312-13. 

Crop estimation, Bibl., Yates (1936c) 502. 

Crum, W. L., N.R., 437. 

Cumulants, Bibl. : Ayyangar (1938) 444, Cornish 
and Fisher (1937) 453, C. C. Craig (1931c) 
454, Dressel (1940, 1941) 458, Frisch (1926) 
463, Gotaas (1936) 465, Thiele (1931) 494. 
See also k-statistics, Moments. 

Curtiss, J. H., N.R., 216. 

Curve fitting, Bibl.: Elderton and Hansmann 
(1934) 459, Fisher (1912) 461, Jones (1937a) 
472, Kerrich (1935) 473, Koshal (1933, 
1935, 1939) 474, Myers (1934) 479, Nair 
and Shrivastava (1942) 479, Nair and 
Banerjee (1943) 479, K. Pearson (1901c) 483, 
Rhodes (1930) 488, Roos (1937) 489, 
K. Smith (1916) 492, Snow (1911) 492, 
Wald (1940a) 497. See also Least Squares, 
Regression, Trend. 

Curvilinear regression, 145-74. Bibl., Menders- 
hausen (1937a) 477, T. V. Moore (1937) 
478; and see Regression. 

Cycle, 397-8. See Periodicity. 

Cyclical effects, tests for, 
Periodicity. 


124-7, 370. See 


D'-statistie, N.R., 359. Bibl. : Bhattacharya and 
Narayan (1942) 446, R. C. Bose (1936a, b) 
447, R. C. Bose and Roy (1938c, 1940) 
448, S. N. Bose (1935, 1937) 448, Roy 
(19392) 489. See also Discriminatory 
Analysis, Multivariate Analysis, E 

Daly, J. F., on shortest confidence intervals, 82; 
on bias in tests, 323; N.R., 304. 

Daniels, H. E., (Example 23.2) 183-5; rank 
correlations, 441. 

Dantzig, G. B., N.R., 304. 

David, F. N., confidence intervals for correlations, 

, 81; N.R., 304. 

Davis, H. T., time-series, 433, 434; N.R. 394, 
437. 

Day, E. E., N.R., 245. 

Death rates, Bibl., Farr (1919, 1920) 460, Pearson 
and Tocher (1916c) 485. 

Decomposition of series, Bibl. Anderson (1927) 
443, Smirnoff (1935) 491. See also Time- 
series. 

Decreasing functions, Bibl., C. D. Smith (1939) 
491. 

Degrees of freedom, of “ Student's" t, 102; of 
hypotheses, 270. 

De Lury, D., N.R., 137. 
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Denumerable probabilities, Bibl., Steinhaus ( 1923) 
492. 

Dependence, see Independence, Correlation. 

Derkson, J. B. D., on stochastic convergence, 440. 

Design, of sampling inquiries, 247-68;  pre- 
liminary points, 248-9 ; stratified sampling, 


249-52; design of experiments, 252-4; 
orthogonality, 254; replication, 255 ; 
randomisation, 255-6; sensitivity of a 


test, 256-7; Latin squares, 257-62; con- 
founding, 262; design and randomisation, 
263-6. 

Bibl.: Bhattacharya (1943) 446, Chris- 
tidis (1931) 451, Fisher (1935c) 462, Jeffreys 
(1939e) 471, “ Student ” (1938) 493, Wold 
(1943) 498, Yates (1939e) 502. See also 


Blocks, Factorial’ Experiments, Latin 
Squares, ete. 
Determinantal equations, Bibl., Girshik (1939) 


465. See also Matrix. 

Deviance, footnote, 178. 

Difference, of two means, test of (equal variances) 
109-11; (unequal variances) 111-14. See 
also Behrens’ Test, Two Samples. 

———, of two variances, 115-16. 

——, equations, Bibl, Frisch (1932) 
Marples (1932) 477. See also 
regression Equations. 

Differences of variates, BibL, Irwin (1937a) 470. 
Dilution method, Bibl., R. D. Gordon (1939) 465, 
Matuzewski and others (1935) 477. 

Dirichlet integrals, 298. 

Discontinuous variates, Bibl. ; dell’ Agnola (1937) 
456; Guldberg (1934) 466, Muench (1938) 
478, H. W. Norton (1937) 481, Ottestad 
(1937, 1938) 481. 

Discordant samples, 128. 

Discriminatory analysis, discriminant function, 
341-8. Bibl. : Barnard (1935) 444, Bartlett 
(1939c) 445, Dwyer (1942) 458, Fisher 
(1936a, 1938c, 19396, 1940d) 462, P. L. Hsu 
(19396, 1941a, 1941c) 469, H. F. Smith 
(1936) 492, Travers (1939) 495, Wallace 
and Travers (1938) 498, Welch (19395) 498, 
Wilks (1938d) 500. See also Multivariate 

à Analysis. 

Dispersion, Bibl., Norris (1938) 481. See Variance, 

ete. 

matrix, 330, 341, N.R., 358. 

Dissection of frequency-distributions, Bibl., Burrau 
(1934) *450. 

Distributed lags, see Lags. 

Distributions, generally, Bibl.:  Ambarzumian 
(1937) 443, Baten (1933a) 445, (1934) 446, 
Bispham (1922) 447, Bochner and Jessen 
(1934) 447, Bochner (1937) 447, Bowley 
(1933) 448, Burr (1942) 450, Camp (1937) 
450, Cannon and Wintner (1935) 450, 


463, 
Auto- 
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Chapelin (1932) 451, Cramér and Wold 
(1936) 454, Edgett (1931) 458, Eyraud 
(1938a) 459, Glivenko (1933) 465, Guldberg 
(1935) 466, Hansmann (1934) 467, Hartman 
and others (1937) 467, (1939) 468, Haviland 
(1934a, b, 1935, 1939) 468, R. Henderson 
(1907) 468, Jessen and Wintner (1935) 
471, Khintchine (1937a) 473, Kullback 
(1936b) 474, Mazzoni (1934) 477, K. Pearson 
(1923c, 1924a) 485, R. Schmidt (1934) 490, 
von Mises (1939a) 497. 

Dodd, E. L., period generated by moving average, 
384, N.R., 394. 

Doob, J., N.R., 45. 

Dosage-mortality, Bibl., Garwood (1941) 464. 

—— -response, Bibl., Irwin and Cheeseman (1939) 
470. 

Dugué, D., N.R., 45. 

Duration of play, BibL, de Finetti (19395) 456, 
Fieller (1931a) 460. 


Eden, T., on Fisher's distribution, 206, (Examplo 
23.8) 214, N.R., 216. 

Edgeworth, F. Y., N.R., 45. 

Edwards, J., Integral Calculus, footnotes, 44 and 
50. 

Efficiency, of estimators, 5-7; of maximum 
likelihood estimators, 18-19; of moments 
in fitting Pearson curves, 43-4 ; of sampling, 
Bibl., Yates and Zacopanay (1935c) 502. 

Egg-production, in laying: hens, (Table 29.5, 
Figure 29.5) 368: 

Egyptian skulls, (Example 28.3) 345-8. 

Elastieity of demand, Bibl, Mosak (1939) 478, 
Schultz (1933) 490. 

Elderton, E. M., (Example 21.14) 133, N.R., 266. 

Elderton, Sir William P., N.R., 45. 

Electric lamps, testing of, (Example 23.1) 179-80. 

Elimination of variates, in regression analysis, 
167-70. 

Enumeration in sampling, Bibl., Cochran (1939b) 
452. 

Equidetectability, curves of, 318. 

Equimodal distributions, Bibl., Mouzon (1930) 478. 

Error, in variance-analysis, 187. 

Errors, of first and second kind, 270, (Exercise 
26.5) 305. 

—, general theory of, Bibl.: Brelot (1936, 
1937) 449, Campbell (1935) 450, Cramér 
(1928) 454, Deming and Birge (1934) 456, 
Edgeworth (1905, 1906) 458, Jeffreys (1933, 
1937c, 1938d, 1939d) 471, Mahalanobis 
(1922) 476, Wertheimer (1932) 499. See 
also Least Squares. 

Estimation, generally, 1-49, 50-62; in analysis 
of variance, 181, 218-19. 

Estimator, definition, 2; consistence of, 3; bias 
of, 3-4; efficiency of, 5-10 ; sufficiency of, 
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7-12; approximation to, 22-4; most 
general sufficient form, 24-5; accuracy of, 
28-9; ancillary, 32-3; in multivariate 
case, 33-42; location and scale, 40-2; 
by minimum variance, 50-5; by minimum 
x^% 55-8; by inverse probability, 58-9; 
by least squares, 59-60. See also Maxi- 
mum Likelihood, Minimum Variance. 
Bibl.: Aitken and Silverstone (1942) 

443, Beall (1939) 446, S. S. Bose and 
Mahalanobis (19385) 448, Darmois (1935, 
1936) 455, O. L. Davies and Pearson (1934) 
455, Doob (1936) 457, Dugué (1936a, b, 
19375) 458, Fisher (1925b) 461, (1934d, 
1938b,d) 462, Geary (1942, 1944) 464, 
Halphen (1939) 467, Neyman (1937b) 480, 
E. S. Pearson (1937a, 1939) 483, Pitman 
(19376, 1939a) 486, Wald (1939a) 497. 

Expectation of life, see Life. 

Expected values, see Mean Values. 

case, in sociological data, Bibl., Stouffer and 

Tibbits (1933) 493. 

Expenditure of families, (Example 23.9) 214-15. 

Exponential distribution, (Exercise 26.8) 305-6. 
Bibl., Paulson (1941) 482, Sukhatme (19365) 
493. 

Extra-sensory perception, Bibl, Greenwood and 
Stuart (1937) 465, Stevens (19395) 493. 

Extremes, distribution of, Bibl.: Daniels (1941) 
455, de Finetti (1932) 455, Dodd (1923) 
456, Fisher and Tippett (1928a) 461, 
Gumbel (1934, 19356) 466, McKay (1935) 
477, Olds (1935) 481, Tippett (1925) 495. 
See also mth Values. 


F-distribution (variance ratio), Bibl., Merrington 
and Thompson (1943) 478. See Fisher's 
Distribution. 

Factor analysis (psychology), Bibl.: Bartlett 
(1937e) 445, W. Brown (1935) 449, Burt 
(19374, b, 1938a, b) 450, Camp (1932, 1934) 
450, Darmois (1934) 455, Emmett (1936) 
459, Hoel (1937, 1939) 468, Irwin (1933) 
470, Ledermann (1938) 475, Roff (1937) 
489, Thomson (1916, 19196, 1939) 494, 
Thurstone (1935, 1938) 495. 

Factorial experiments, 199—202.  Bibl.: Barnard 
(1936) 444, R. C. Bose and Kishen (1941) 
448, Cornish (1936, 1940b, c) 453, Goulden 
(1937, 1938) 465, P. L. Hsu (1943) 470, 
Kishen (1940) 473, Wishart (1938) 501, 
Yates (1937b) 502. 

— moments, Bibl., Gonin (1936) 465, Ottestad 
(1939) 481. 

— sums, in fitting regressions, (Example 22.8) 

* 164—5. 

Factorisation of variables, Bibl., S. C. Dodd (1927) 

457. 


Families of alternatives, 275-6. 

Feller, W., N.R., 303. 

Fiducial inference, 85-95. Bibl. : Bartlett (1939a) 
445, Fisher (1933, 1935a, 1935b, 1936c, 
19375, 1939a, 1940c, 1941a) 462; Garwood 
(1936) 464, Ricker (1937) 488, Segal (1938) 
491, Wilks (1938b,c) 499, (1939a, b) 500. 
See Confidence intervals. 

Field experiments, Bibl, Wishart and Saunders 
(1935) 501. See Design. 

Fifteen-constant surface, Bibl., K. Pearson (1925a) 
485. 

Filon, L. N. G., N.R., 45. 

Finite populations, sampling from, Bibl.: Church 
(1926) 452, Hansen and Hurwitz (1940) 
467, Irwin and Kendall (1944) 470, Isserlis 
(1918c, 1931) 470, Neyman (1925) 480, 
O'Toole (1934) 481, Sukhatme (1944) 494, 
Tschuprow (19185, 1921, 1923) 495. 

Finney, D. J., z-test, 199 ; test of significance in 
periodogram analysis, 434 ; N.R., 137, 210. 

Fisher, R. A., fitting by moments, 43; fiducial 
probability, 90; tables for Behrens' test, 
92, 93, 111; expansion of “ Student's" 
integral, 101; tables of z, 102; difference 
of two means, 110; 2-distribution, 116, 
117; configuration of a sample, 127; 
fitting regressions, 165; theorem on sum 
of squares, 176-7; design'of experiments, 
263; discriminatory analysis (Example 
28.2) 342-4; distribution of canonical 
correlations, 357 ; significance of a periodo- 
gram, 434; N.R., 45, 61, 83, 94, 136, 173, 
216, 245, 266, 359. 

Exercises from: (Exercise 17.1) 45, 
(Exereises 17.4, 17.5, 17.6) 46, (Exercise 
17.12, 17.15, 17.16) 48, (Exercise 17.19) 49, 
(Exercise 18.3) 61, (Exercises 20.1, 20.2) 
94-5. 

Fisher’s distribution (z-distribution), properties of, 
116-18; in variance analysis, 179, 199; 
in non-normal case, 205-6, 234-6, (Example 
26.8) 289-91; in linear hypothesis, 301; 
in discriminatory analysis, 345. 

Bibl.: Aroian (1941) 444, R. A. Chap- 
man (1938) 451, Cochran (1940a) 452, 
Daniels (1938a) 454, Eden and Yates (1933) 
458, Fisher (1924c) 461, P. L. Hsu (1941c) 
469, Lawley (1938) 475, McCarthy (1939) 
477, Paulson (1942) 482, Welch (1937) 498. 

Fitting, see Curve Fitting, Least Squares. 

Flood flows, Bibl., Gumbel (1938a; 1941) 466. 

Fluctuations in time-series, Bibl, R. A. Gordon 
(1937) 465. See Time-series. 

Forecasting, Bibl.: Cowles (1933) 453, Cowles and 
Jones (1937) 453, de Finetti (1937) 456, 
Schultz (1930) 490, Yates (1936c) 502. 

Forsyth, A. R., Calculus of Variations, footnote, 50. 
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Fourier analysis, see Harmonie Analysis, Period- 
jeity. 

Fragmentary samples, Bibl., Wilks (1932a) 499. 

Frankel, L. R., N.R., 136, 266. 

Freedom, degrees of, see Degrees of Freedom. 

Frequency-distributions, see Distributions. 

Frequency theory of probability, Bibl.: Campbell 
(1939) 450, Cantelli (1923, 1932, 1933b) 450, 
(1936) 451, Dórge (1934, 1936) 458, von 
Mises (1931) 497. See Probability, Random 
Sequence. 

Friedman, M., (Example 23.9) 214-15. 

Frisch, R., N.R., 358. 


Galton’s problem, Bibl.: Galton (1902) 464, Irwin 
(1925a) 470, K. Pearson (1902c) 484. See 
Rank Correlation. 

Gamma distribution, Bibl., Kibble (1941) 473. 
See Type III. 

Garwood, F., confidence intervals for Poisson dis- 
tribution, 81. 

Gauss, K. F., variance of residuals, 60-1; stan- 
dard errors, 153; N.R., 45. 

Gaussian distribution, see Normal Population. 

Geary, R. C., distribution of t, 102-4; test of 
normality, 106 ; theorem on independence, 
118; (Exercises 21.1, 21.2) 137-8; N.R. 
45, 136. 

Geary's ratio, Bibl, Geary (1935a, b, 1936a) 464, 
Tricomi (1937) 495. 

General factor (intelligence), see Factor Analysis. 

Generalised distance, of Mahalanobis, N.R., 359. 

Generating functions, Bibl., Aitken (1931) 442. 
See Characteristic Functions. 

Geometric Mean, Bibl., Camp (1938a) 450, Norris 
(1938, 1940) 481. 

Germination of wheat-seeds, (Example 23.7) 207-9. 

Gini's mean difference, 108. 

Girshik, M. R., (Exercise 28.11) 362, N.R., 359. 

Glass, seed in, (Example 23.6) 202-4. 

Goodness of fit, tests of, 106-9. Bibl.: David 
(1939) 455, Neyman (1937a) 480, K. Pear- 
son (1934) 486, Thomson (1919a) 494. See 
Chi-squared. 

Gosset, W. S. (* Student "), 80, 266, N.R., 394. 

Gould, C. E., (Example 23.6) 202—4. 

Goulden, C. H., N.R., 216, 266. 

Grades, see Rank Correlation, Galton's Problem. 

Graduation, Bibl., Aitken (1933a, b, c) 442, Key- 
fitz (1938) 473. See Interpolation, Least 
Squares, Orthogonal Polynomials, Trend. 

Graeco-Latin square, 261-2. Bibl., R. C. Bose 
(19385) 448. 

Gram-Charlier series, estimation in (Exercise 18.1) 
61; for non-normal #, 103; goodness of 
fit in, 109; in z-distribution, 116. Bibl. : 
Aitken and Oppenheim (1931) 442, Aitken 
(1932) 442, Aroian (1937) 444, Baker 


(1930d, 1935) 444, Charlier (1906, 1912, 
1928, 1931) 451, Cornish and Fisher (1937) 
453, C. C. Craig (19315) 454, Cramér (1926, 
1935b) 454, Doetsch (1934) 457, Edgeworth 
(1905) 458, Gram (1879) 465, Hildebrandt 
(1931) 468, Jacob (1933, 1935, 1937) 471, 
Meisener (1938) 477, Quensel (1938) 487, 
. Samuelson (1943) 490, Schmidt (1934) 490, 
Steffensen (1930) 492, Wicksell (19175, 


1934a) 499. 

Greenstein, B., N.R., 437. 

Grouping corrections, Bibl.: Abernethy (1933) 
442, Alter (1939) 443, Baten (1931) 445, 


Bliimel (1939) 447, Burkhardt and Stackel- 
berg (1939) 449, Carver (1933, 1936) 451, 
C. C. Craig (1936c, 19415) 454, Elderton 
(1933, 19385) 459, Kendall (19384) 472, 
Lewis (1935) 475, Sandon (1924) 490. 

, effect on correlations, Bibl, Gehlke and 
Biehl (1934) 464. 

—, significance of, Bibl., Stevens (1937b) 493. 

Groups of experiments, Bibl, Yates and Cochran 

(19385) 502. 


Hampton, W. M., (Example 23.6) 202-4. 

Hansmann, G. H., N.R., 45. 

Harmonie analysis, Bibl. : T. F. Anderson (1935) 
443, Brunt (1928) 449, Carslaw (1930) 451, 
Fisher (1929a) 461, (1940a) 462, Frisch 
(1928, 1931, 1933) 463, Pollak (1926) 487, 
Turner (1913) 496, Wiener (1930) 499. 
See Periodicity. 

—— mean, Bibl., Norris (1939) 481. See Mean 
Values. 

Hartley, H. O., on z-test, 199; k samples, 299 ; 
N.R., 137, 216, 304. 

Heads and tails, Bibl., Fieller (1931c) 460. See 
Duration of Play. 

Hendricks, W. A., (Exercise 21.9) 139 ; N.R., 136. 

Hermite polynomials, see Tchebycheff-Hermite 
Polynomials. 

Heterogeneous populations, Bibl., Baker (1930c, 
1932) 444. See also Lexis Theory, Strati- 
fied Sampling. 

Hierarchies in correlation, Bibl., Thomson (1916, 
1919b, 1935) 494, Wilson (1928) 500. See 
Factor Analysis. 

Higham, J. As (Exercise 29.7) 395. 

Highest audible pitch, (Example 22.4) 152-3, 
(Example 22.5) 155-6. 

Hirschfeld, H. O., see Hartley, H. O. 

Homogeneity, Bibl.: Baker (1941) 444, Hartley 
(1940) 467, Welch (1938a) 498. See k 
samples. 

Horse population and wheat prices, 436. 

Hotelling, H., canonical correlations, 348-58; 
(Exercises 28.7-28.10) 360-2; N.R, 45, 
136, 359. 
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Hotelling’s T, 323, 335-8; N.R., 359. Bibl., 
Hotelling (1931) 469, P. L. Hsu (1938c) 469. 

Hsu, P. L., linear hypothesis, 301; Wishart’s 
distribution, 333; canonical correlations, 
357; N.R., 304, 359. 

Hypergeometrie series, Bibl.: Ayyangar (1934) 
444, Camp (1925a) 450, O. L. Davies (1933, 
1934) 455, Gonin (1936) 465, K. Pearson 
(18995) 483, (19246, c) 485, Romanovsky 
(1925b) 489. 

Hypotheses, testing of, see Statistical Hypotheses. 


Imaginary random variables, Bibl., Eyraud (19380) 
459. 

Immunity, Bibl., Brownlee (1905) 449. 

Incomes, distribution of, Bibl, Cantelli (1929) 
450, Darmois (1933) 455. 

Incomplete blocks, see Blocks. 

Independence, of quadratic forms, Bibl. ; Cochran 
(1934) 452, A. T. Craig (1936a, 1943) 453, 
Madow (1940) 476. 

, statistical, Bibl.: del Vecchio (1933) 456, 
Kae and van Kampen (1939) 472, Marcin- 
kiewiez and Zygmund (1937) 477, Tschu- 
prow (1934) 496. See also Correlation, 
Contingency, ete. 

Index, distribution of, see Ratio. 

——— numbers, Bibl.: Bowley (1926) 448, Clare- 
mont (1916) 452, Crowther (1934) 454, 
DDodd.(1937c) 457, Edgeworth (1925a, b, c) 
459, I. Fisher (1922) 460, Flux (1921, 1933) 
463, Frickey (1937) 463, Frisch (1930) 463, 
Haberler (1927) 467, Konós (1939) 474, 
Persons (1928) 486, Rhodes (1936) 488, 
Schultz (1939) 490, Yates (1939c) 502. 

Indices, correlation of, Bibl. : Baker (1937) 444, 
J. W. Brown and others (1914) 449, Clare- 
mont (1916) 452. 

Industrial accidents, Bibl., Newbold (1927) 479. 

processes, see Quality Control. 

Inequalities, Bibl. : Mortara (1934) 478, Narumi 
(19235) 479, Norris (1935, 1937) 481, 
Romanovsky (1938) 489, Shohat (1929) 
491, C. D. Smith (1930) 491, von Mises 
(19395) 497, Wald (1938) 497. 

Infantile mortality, Bibl Feld (1924) 460. 

Infection in potatoes, (Example 24.5) 230-2, 
(Example 24.6) 232-3. 

Inference, see Statistical Hypotheses. 

Information, amount of, 29-30; loss of, 30-2; 
in minimum 7?, 57-8. Bibl.: Bartlett 
(1936a, b) 445, Fisher (19340, 1935a) 462. 

Intensity, of a periodogram, 425. 

Interaction, in variance-analysis, 187, 188-9. 

Interference, analysis of, Bibl., Stevens (1936) 493. 

Interpolation, Bibl.: Comrie (1936) 452, Erdés 
and Turan (1937, 1938) 459, Feldheim 
(1936a) 460, Fisher and Wishart (1927) 


461, Gini (1921) 465, Lidstone (1937) 476, 
Pietra (1932b) 486, Salvemini (1934) 490, 
Simaika (1942) 491, Tehebycheff (1907) 
494. See also Graduation, Least Squares, 
Orthogonal Polynomials. 

Intra-class correlation, 181, Bibl. Harris (1914) 
467, Harris and Gunstad (1931) 467. 

Intrinsic accuracy, in estimation, 28-9. 

Invariants of frequency “curves, Bibl, Zoch 
(1934) 503. 

Inverse probability, in estimation, 58-9 ; relation- 
ship with fiducial inference, 90-1, 93-4. 
Bibl.: Bayes (1763) 446, Fisher (1926c, 
1930a) 461, (1932, 1935a) 462, Isserlis (1936) 
471, Jeffreys (19375) 471, Tornier (1937) 
495, Wisniewski (19375) 501. : 

Tris (flower), (Example 28.2) 342-4. 

Irregular Kollektiv, 123. See Random Sequence. 

Irwin, J. O., (Exercise 23.1) 216-17; sampling 
moments, 440; N.R., 216. 

Item analysis, Bibl., Merril (1937) 478. 

Iterations, see Runs. 


J-shaped distributions, Bibl., Elderton (1933) 
459, Solomon (1939) 492. 

Jackson, W. R., N.R., 304. 

Jeffreys, H., (Example 18.5) 56-7;  fiducial 
inference, 90-1, 93-4; N.R., 61, 94, 2660. 

Jensen, A., N.R., 266. 

Joint sufficiency, 39. 

Judgments, validity of, Bibl., Eysenck (1939) 459. 


k samples, problem of, 119-22, 295-9; bias in, 
323, (Exercise 27.2) 326. Bibl.: Bartlett 
(1934a) 445, Bishop (1939) 447, Bishop and 
Nair (1939) 447, R. C, Bose and Roy (1940) 
448, G. W. Brown (1939) 449, Neyman 
and Pearson (19315) 480, Pearson and 
Wilks (19335) 482, Sukhatme (193065) 493, 
(1937b) 494, Welch (1935) 498, Wilks 
(19356) 499. See L-tests. 

k-statistics, Bibl.: Fisher (19295) 461, Fisher and 
Wishart (1931) 462, C. T. Hsu and Lawley 
(1939) 469, Kendall (1940) 472, (19425) 473, 
Wishart (1929a, b, 1930, 1933b) 500. See 
also Moments, sampling. 

Kelley, T. L., (Example 28.4) 351-2. 

Kermack, W. O., N.R., 136. 

Keynes, Lord, (Exercise 17.7) 47. 

Kolmogoroff, A., confidence intervals for ter- 


minals, 83. * 

Kolodzieczyk, St., linear hypothesis, 203; N.R., 
304. 

Koopman, B. O., (Exercises 17.13, 17.14) 48, 
N.R., 45. 

Koshal, R., N.R., 45. ^ 


Kronecker delta, 329. 


52 ; INDEX 
Kurtie curve, 142. Linear equations subject to error, Bibl., Lonsoth 
Kurtosis, Aib, Frisch (1934a) 464. (1942) 476. 

— h; 292-5, 300-2. Bibl., Johnson 


L-tosts, Bibl.: Mahalanobis (1933) 476, Mood 
(1039) 478, Nayer (1036) 479, Paulson 
(1041) 482, Woloh (10362) 498, Wilks and 
Choson (1937a) 499, See k samples. 

Lag 435-6. 


correlation, 

Lags, distributed, Bibl.: Alt (1942) 443, Koop- 
mans (1041) 474, K. R. Nair (1936) 479, 
Zrzavy (1933) 503. 

milk investigation, N.R., 266, 

Largo numbers, law of, see Convergence in Proba- 


(19385) 448, R. C, Boso and Nair (19425) 
448, Euler (1782) 459, Fisher and Yates 
(19340) 462, Fisher (1942d, e) 462, Mann 
(1943) 477, H. Norton (1939) 481, Stevens 
ND MS ath (1997) dS Yates (1933c) 


Adeoek (1878) 442, Aitken (1933a, b, c, 
19352) 442-3, Davis (1933) 456, David and 
- Neyman (19380) 466, Deming (1931, 1934, 
_ 19%, 1937) 466, Hendricks (1931, 1934) 
408, E. Johnson (1040) 471, Jones (19372) 
(0 412, Jordan (1932, 1934) 472, Kerrich (1937) 
473, Sheffer (1935) 491, Sheppard (1914, 
1029) 401, Storno (1934) 493, Wisniewski 
(1937a) 501, Wong (1935) 501. 
Loxis, W., ratio, ED N.R., 216, 
—: : Geiringer (1042) 465, Ridor 
(1934) rs Tehoprow GM 19192) 495, 
von Bortkiewioz (1031) 497. 
of, eto, Bibl. ; "Brownlee and 
(1911) 449, Dublin and others 
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Likelihoesl, 
- E in testing hypothows, 277-80, 295- 
462, 


iypotheses, 

and Noyman (1936) 472, Kolodzioczyk 
(1935) 474. 

Linearity of regression, see Regrossion. 


Linkago, Bibl., Finney (1940, 1941, 1942) 46), 
N. L. Johnson (19405) 472. 

Link-relatives, Bibl., Robb (1930) 489. Sce Index 
Numbers. 

Live births, proportion of males among, (Examplo 
21.8) 120. 


Location, estimation of parameters of, 40-2; 
centre of, 41; Pitman’s testa of, 323-0. 
Bibl., Pitman (1939, b) 486. 
ithmie variate, Bibl.: Finney (19415) 400, 
Jenkins (1932) 471, Nydell (1919) 481, 
Pao-Tsi-Yuan (1933) 481, Quensel (1936) 
a Wicksell (1917a) 499, Williams (1937) 


Loss of ha in estimation, 30-2. 
—— —— weight in soil, (Example 22.3) 149-52, 
(Example 22.6) 158. 


m rankings, problem of, (Example 23.9) 214-165. 
Bibl., Friedman (1937, 1940) 463, Kendall 
and Babington Smith (19395) 472. 

Macaulay, F. R., (Exercise 29.4) 395; N.R., 394. 

MacStowart, W., N.R., 304. 

Madow, W. G., N.R., 359. 

Magnotio declination, Bibl., Schuster (1899) 490. 

Magnitudo, random division of, BibL, Fisher 
(1940a) 462, Stevens (1939a) 493. 

Mahalanobis, P, C., N.R., 303, 304, 359. 

Males, proportion in births, (Example 21.8) 120 ; 
marriages of, (Example 21.9) 121-2. 
Markoff, A. A., theorem on least squares, (Exercise 

25.5) 267. è 
process (Markoff chains), Bibl.: Doeblin 
(1936, 1937) 457, Elfving (1937, 1938) 459, 
Feldhoim (19365) 460, Fortot (1935-8) 463, 
Fréchet (1935, 193605, 1937a) 463, Geiringer 
(1938) 464, Hadamard and Fréchot (1933) 
467, Hostinsky (1937) 469, Kolmogoroff 
(19375) 473, Lévy (19356, 1936c) 476, 
Markoff (1912) 477, Mihoc (1934) 478, 
Onicescu and Mihoc (1935-9) 481, Roman- 
oysky (193602) 489, Séukarev (1932) 490. 
males ing to age at, (Example 


21.9) 121-2, 
— rate in England and Wales, (Table 30.2) 397, 
(Resp 30.3, Table 30.5, Figure 30.4) 


Meri, K. E. 5. N.R., 359. 
Mass production, see Quality Control 
problems, Bibl.: Battin "usur 446, 
D. W. Chapman (1935) 451, J. A. Groon- 
wood (1938) 465, (1940) 466, Groville (1938, 
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1041) 466, Olds (10384) 481, Vernon (1936) 
406, Wilks (1932c) 499. 

Mathematical Tripos, distribution of women 
obtaining firsta in, (Example 18.5) 56-7. 

Matrix, arithmetic of, Aitken (19374, b, 1038) 443, 
Bingham (1941) 447, Dwyer (10412, b) 458, 
Hotelling (1943) 469. 

Maximum likelihood estimatom, 12-49; con- 
wintenoo, 13-15 ; normality, 15-17 ; variance 
of, 17-18 ; efficiency of, 18-19 ; sufficiency, 
10-20; for several parameters, 34-49; 
variance and covariance of, 36-7 ; relation 
with minimum variance, 53, and with con» 
fidence intervals, 73-4. 

Bibl. : Carlson (1932) 451, Fishor (1912, 
1921a, 19255, 10258c) 461, (1932, 1034) 
462, Hotolling (1930) 469, Jeffreys (19385, 
1938c) 471, Koshal (1033, 1935, 1939) 474, 
Myers '(1934) 479, E, 8. Pearson (1937a) 
483, K. Pearson (1936) 486, Welch (19390) 
499. 

MoKondrick, A. G., N.R., 136, 

Mean, arithmetic, estimation of, 2; (Example 
17,6) sufficiont ostimator for, 11 ; (Example 
17,7) 19-20 ; most general distribution for 
which it is estimator (Example 17.10) 22; 
significance of, 98-100, (Examples 27.1, 
27.2) 311-12, 

deviation, in testing normality (Geary's 
ratio), 106; distribution of md, Bibl. : 
Fisher (1920) 461, Fréchot (103¢a) 463, 
Tricomi (10365, 1037) 495. 

difference, 108, Bibl, ¢ 
de Finotti and Paciollo (19305) 455, de 
Finetti (1931) 455, U, 8, Nair (1036) 479, 
Wold (1935) 501. 

values, Bibl: Aumann (1934-5) 444, Bunak 
(1936) 449, A. T. Craig (19364) 453, Dodd 
(1034, 19372, b, e, 1038) 457, Doodson (1917) 
458, Drossel (1041) 458, Norris (1035, 1937) 
481, Wertheimer (1937) 499, Yasukawa 
(1925) 501, Zooh (1935, 1937) 503. 
Moans, distribution of, Bibl, ; Baker (19304, 1031, 

1932, 1030, 1040) 444, Hohrons (1029) 446, 

R. C. Bose (10384) 448, Carlson (1932) 451, 

Cochran (19370) 452, A. T. Craig (1932) 

453, Dodd (1926-7) 456, Dunlap (1931) 458, 

Hall (10275) 467, Holsinger and Chureh 

(1929) 469, Irwin (1927, 1929, a, 1930) 470, 

Immer (1937) 470, TLexwerlis (19180) 470, 

Jeffreys (1940) 471, Kolmogoroff (1929) 473, 

Pizzetti (1930) 487, Pollard (1934) 487, 

Rhodes (1927) 488, Romanoveky (1929) 

480, Simon (1943) 491, Truksa (1940) 495, 

See ale Contral Limit Theorem, Mean 

Values. 

2o, tout of difference, see Difference ; in multi. 

variate analysis, 338-41. 
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Cantelli (1913) 450, 


Mean-square contingency, see Contingeney. 

— — successive difference, Bibl. : Hart (1942) 
407, von Neumann and others (1941a, b) 
497, J. D. Williams (1941) 500, 

Median, as estimator, 5; contidence intervals for, 
(Exercise 10.5) 84. Bibl.: Cisbani (1938) 
452, Doodson (1917) 458, Gini and Galvani 
(1929) 465, Gini (1938) 465, Gini and 
Zappa (1038) 465, Gulotta (1938) 400, 
Haldane (19425) 467, Hojo (1931, 1933) 
469, Jackson (1921) 471, K. R. Nair (10400) 
479, K., Pearson (19315) 486, Pollard (1934) 
487, Savur (1937a) 490, W. R. Thompson 
(1936) 494, Ville (103660) 496, 

Migration, see Random Migration, 

Minimum variance, of maximum likelihood osti- 
matom, 18-19; in estimation, 50-6, 

—— x’, in estimation, 55-8, 

Missing plot technique, 220-33. Bibl.: Allan 
and Wishart (1930) 443, Cornish (140d, b) 
453, K. R. Nair (10402) 470, Yates (10335) 
501, Yates and Hale (190395) 502, 

Mode, Bibl.: Doodson (1917) 458, Haldane 
(19042b) 407, K. Poarson (19025) 484, 
Yasukawa (1926) 601. 

Moment-function, Bibl, U. 8. Nair (1939) 479. 
See Charaotoristio Functions, Generating 
Funotions, 

Momonta, officienoy of, 43-4. 

— of distributions (specification), Bibh? Corn» 

ish and Fisher (1937) 453, Fisher (19370) 

462, R. Henderson (1907) 468, O'Toole 

(1033) 481, Poarl (1037) 482, K. Pearson 

(1938) 486, Romanoveky (19305) 480, von 

Mises (1937) 497, See Curve Fitting. 

of, Bibl, : HBódowadt (1036) 447, 

Broggi (1934) 449, Chlodoveky (1938) 451, 

Hamburger (1920, 1921) 467, Hauwdorf 

(1923) 468, Haviland (1935, 1036) 468, 

Marcinkiowiez (1039) 477, Pólya (1020, 

10380) 487, Stekloff (1014) 402, Stioltjes 

(1915) 493, Widdor (1934) 499. 

-—, sampling, Bibl; Bernstein (1092) 446, 
€, €, Craig (1028) 453, (1040) 404, Dwyer 
(1937a, 1035, 1040) 458, Fisher (10294) 
461, Fisher and Wishart (1931) 462, Geary 
(1033) 464, Irwin and Kendall (1044) 470, 
lYoworlin (19185, e, 1931) 470, St. Georgesou 
(1932) 493, Sukhatme (1038, 1044) 404, 
Teohuprow (19185, 1021, 1923) 405, Wilke 
(1924, 1934) 499, Wishart (10200, b, 1030, 
103 1a, b, 19335) 500, Wishart and Bartlett. 
(19325) 500, Ziaud-din (1038) 503. See 
Oleo. k-statistica. 

Monotonie functions, in distribution theory, Dib, 
Bochner (1937) 447. 

Mood, A. M., N.R., 304. 

Moore, G., phasos in time-series, 126; N.R 136 
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Morant, G., N.R., 394. 

Morgan, W. A., N.R., 137. = 

Mortality, see Life. 

Most-efficient estimator, 6, 10, 18-19. 

Most-selective confidence intervals, 75, 82. 

Moths, effect of weather on, (Example 22.10) 
171-2. 

Moving averages, 372-87, 399. Bibl.: Dodd 
(1939a, 1941a,b) 457, Frisch (1938) 464, 
Wold (19385) 501. 

mth values, Bibl Gumbel (1934, 1935a, 1939) 
466. 

Multinomial distribution, BibL, Kullback (1937) 
474, Lurquin (1937) 476. 

Multiple correlation, Bibl.: Bacon (1938) 444, 
R. C. Bose (1934) 447, Fisher (19285) 461, 
Hall (1927a) 467, Kelley and MeNemar 
(1929) 472, Kullback (1936c) 474, K. Pear- 
son and Lee (1908) 484, K. Pearson (1916d) 
485, K. Pearson and Young (1918) 485, 
Soper (19294) 492, Starkey (1939) 492, 
Tappan (1927) 494, Wilks (19325) 499, 
Wishart (19315) 500, Wong (1937) 501. 

——— curvilinear regression, 167, 236. See Re- 
gression. 

— happenings, Bibl, Greenwood and Yule 
(1920) 466, K. Pearson (19125, 1913) 484. 
See Poisson Distribution, Pólya Distribu- 
tion. 

Multivariate analysis, 328-62; Wishart’s distri- 
bution, 330-4;  Hotelling's distribution, 
335-8; significance of set of means, 338- 
41; discriminatory analysis, 341-8; 
canonical correlations, 348—58. 

Bibl.: Bartlett (19395, 1941) 445, Bishop 
(1939) 447, Fisher (1936a, b, 1938c, 1939), 
1940d) 462, Hotelling (1933, 1936a, b) 469, 
P. L. Hsu (19395, 1941a, c, d) 469, Madow 
(1937, 1938) 476, Mahalanobis (1930, 1936a) 
476, Mahalanobis and others (19365) 476, 
Martin (1936) 477, Rider (1936) 488, Roy 
(1938, 1939a,b, 1942a, b) 489, Simonsen 
(1937) 491, Wald and Brookner (19415) 
498. 

——— distributions, estimation in, 33-7 ; normal, 
see Normal. Bibl.: Leser (1942) 475, 
Lukomski (1939) 476, Mahlmann (1935) 
477. See also Multiple Correlation. 

Myers, R. J., N.R., 45. 


Nair, K. R., confidence intervals for median, 81, 
N.R., 83. 

Nayer, P. N., testing hypotheses, 299 ; N.R., 304. 

Negative binomial, Bibl, Fisher (19415) 462, 
Greenwood and Yule (1920) 466. See Pólya 
Distribution, 

Neyman, J., confidence intervals, 75-6 ; Behrens’ 
test, 93; randomised blocks, 214; theory 


of tests, 270, 299, 308, 311, 323; Exercises 
from: (Exercises 19.2, 19.3) 83, (Exercise 
21.12) 140, (Exercises 26.2, 26.3) 304, 
(Exercises 26.4, 26.5) 305, (Exercise 27.3) 
327. N.R., 45, 83, 94, 136, 172, 266, 303, 
304, 320. 

Nisbet, S. D., (Example 25.1) 258-9. 

Non-central confidence intervals, 66. 

—— —— t,. Bibl, N. L. Johnson and Welch 
(1940a) 471. 

Non-normal data, in variance-analysis, 205-15. 

populations, Bibl.; Baker (1934) 444, 
Bartlett (1935a) 445, C. C. Craig (1941a) 
454, Geary (19366) 464, Laderman (1939) 
474, A. N. K. Nair (1942) 479, Pearson and 
Adyanthaya (1928, 1929) 482, E. S. Pearson 
(19315) 482, Rider (1931a) 487, Rietz (1932, 
1939) 488, Thorndike (1937) 494. 

Non-orthogonal data, Bibl.; K. R. Nair (1942) 
479, Wilks (1938e) 500, Yates (1934a) 501. 

Non-parametric tests, 322. Bibl., Scheffé (1943) 
490. 

Non-random samples, Bibl, “Student” (1909) 
493. 

Nonsense correlations, Bibl., Yule (1926) 503. 

Normal equations, solution of, Bibl., Hoel (1941) 

468. 

population, estimation of mean, 2, (Example 

17.6) 11, (Example 17.7) 19-20, (Example 

18.1) 51; estimation of variance, (Example 

17.6) 11, (Example 18.4) 54-5; centre of 
location of, (Example 17.22) 42 ; confidence 

intervals for mean, (Example 19.1) 63-4, 

(Example 19.3) 70; fiducial distribution, 

85; bivariate, (Example 17.17) 33-4, 

(Example -17.18) 37-8; regressions of, 

(Example 22.1) 144. 

Bibl.: Baker (1931) 444, Bergstrém 
(1918) 446, Cramér (1923, 1936) 454, Erdós 
and Kac (1939) 459, Haldane (1942a, b) 
467, C. T. Hsu (1940, 1941) 469, Isserlis 
(19185) 470, Kae (1939) 472, Khintchine 
(1935) 473, Kullback (1935a) 474, Leder- 
mann (1939) 475, Lehmann (1939) 475, 
Lengyel (1939) 475, K. Pearson (1924c) 485, 
Pólya (1923) 487, Raikov (1938) 487, 
Rhodes (1928) 488, Tricomi (1935, 1936a, 
.1936b) 495, Yule (19385) 503. 

Normalisation of frequency functions, Bibl. : 
Cornish and Fisher (1937) 453, Haldane 
(1938) 467, Mahalanobis and others (19365) 
476, Paulson (1942) 482. 

Normality, tests of, 105-6. Bibl. : Fisher (1930b) 
461, Geary (1935a,b, 1936a) 464, Geary 
and Pearson (1938) 464, E. S. Pearson 
(1930, 1935c) 482, Yasukawa (1934) 501. 

Nuisance parameters, 134. Bibl., Hotelling (1940) 
469. 
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Olds, E. G., N.R., 266. 

Omega, for-testing goodness of fit, 107-9. Bibl., 
; Smirnoff (1936) 491. 

One-sided confidence intervals, 76. 

Oppenheim, S., N.R., 437. 

Order, in random series, 122-4, and see Random 

Order. 

Orthogonal data, in variance-analysis, 219, 254. 

polynomials, 146-54, 159-67. Bibl. ; Aitken 
(1932, 1933a, b, c) 442, Allan (1930) 443, 
Dieulefait (19345) 456, Fisher (19215, 1924) 
461, Greenleaf (1932) 465, Jackson (1934, 
1937, 1938) 471, Jordan (1932) 472, Lidstone 
(1933) 476, Romanovsky (1927) 489, San- 
sone (1933) 490, Shohat (1935) 491, C. D. 
Smith (1939) 491, Tartler (1935) 494, 
Tchebycheff (1907) 494, Webster (1938) 
498, Wishart (1933a) 500, Wong (1935) 501. 
transformations, Bibl Landahl (1938) 474, 
Ledermann (1938) 475. 

Oscillations, in time-series, 369, 370, 380, 397-8. 

See Periodicity. 


p-statisties, Bibl, Roy (19395, 1942a) 489. See 
Multivariate Analysis. 

P;, test, see Combination of Tests. 

Paired comparisons, Bibl., Kendall and Babington 
Smith (1940) 472. 

Parameters, estimation of, see Estimation. 

of location and scale, 40-2. 

Partial correlations, Bibl.: Isserlis (1914, 1916) 
470, Stouffer (1934) 493, Subramanian 
(1935) 493. 

Pasteurised milk, in feeding, (Example 21.14) 133. 

Path coefficients, Bibl., Engelhart. (1936) 459, 
Wright (1934) 501. 

Paulson, E. A., z-distribution, 118 and N.R., 136. 

Peaks, in time-series, 124. 

Pearson distributions, moments in fitting, 43-4 ; 
sufficient estimators in (Exercise 17.18) 49. 
Bibl.: Ambarzumian (1937) 443, Baker 
(1940) 444, Beale (1937) 446, C. C. Craig 
(1935b) 454, Dieulefait (19355) 456, Fisher 
(1921a) 461, Hildebrandt (1931) 468, Irwin 
(1930) 470, K. Pearson (1894, 1895, 19015) 
483, (1916a) 484, (1924a) 485, Romanovsky 
(1924) 489, Wishart (1926) 500. See also 
Type I, etc. 

Pearson, E. S., confidence intervals for binomial, 
81; ft in non-normal ease, 103; test of 
normality, 106; z in non-normal case, 
205; (Exercise 23.4) 216-17; analysis of 
covariance, 238 ; (Exercises 26.2, 26.3, 26.4, 
26.6) 304-5; N.R., 45, 83, 136, 137, 245, 
266, 303, 304, 359. 

—, K, (Example 21.14) 133; N.R., 45, 137, 
172, 173, 394. 


Peas, yields of, (Example 23.5) 200-2. 

Periodicity and periodogram analysis, 423-5, 
432-3, 433-5. Bibl.: Alter (1924, 1925, 
1926a, b, 1933, 1937) 443, Beveridge (1921, 
1922) 446, Bradley and Crum (1939) 449, 
Brownlee (19245) 449, Bruns (1921) 449, 
Brunt (1925, 1928) 449, Buys-Ballot (1847) 
450, J. I. Craig (1916) 454, Crum (1923, 
1925) 454, Dodd (1930) 456, (1939a, b, 
1941a, b) 457, Frisch (1928, 1931, 1933) 
463, Greenstein (1935) 465, Hersch (1934) 
468, Kalecki (1935) 472, Koopmans (1940) 
474, Kuznets (1929, 1933) 474, Larmor and 
Yamaga (1917) 475, Mitchell (1913) 478, 
Mitchell and Burns (1935) 478, Moore (1914, 
1923) 478, Moulton (1938) 478, Oppenheim 
(1909) 481, Pietra (1925) 486, Pollak (1927) 
487, Pollak and Kaiser (1935) 487, Powell 
(1930) 487, Savur (1941) 490, Schuster 
(1898, 1899, 1906) 490, Soper (19295) 492, 
Starkey (1939) 492, Stumpff (1926, 1937) 
493, Tinbergen (1937, 1938) 495, Tintner 
(1935) 495, Trachtenberg (1921) 495, Vinci 
(1934) 496, Walker (1914, 1925, 1927, 1931) 
498, Wallis and Moore (1941) 498, Yule 
(1927a) 503. See also Harmonic Analysis, 
"Time-series. 

Phases, in time-series, 124, 125-6. * 

Pilot sampling, 252, N.R., 266. 

Pitman, E. J. G., tests of significance, 128-32, 
136; z-test, 211; tests of hypotheses, 
323-6; Exercises from, (Exercises 17.9, 
17.10, 17.11) 47, (Exercise 21.3) 138, 
(Exercise 21.15) 140, (Exercise 27.2) 326. 
N.R., 45, 137, 210. 

Plant breeding, Bibl Y. Tang (1938) 494. 

Plot arrangements, Bibl. Tedin (1931) 494, See 
Design. 

Poisson distribution, (Example. 17.9) 21-2; con- 
fidence intervals for, (Example 19.4) 70-1, 
81; conditional test for, (Example 21.12) 
127; in variance-analysis, 206-7. 

Bibl.: Ackermann (1939) 442, R. A. 
Chapman (1938) 451, Cochran (1936a, 
1940b) 452, Copeland and Regan (1936) 453, 
Doetsch (1934) 457, Fisher and others 
(1922c) 461, Garwood (1936) 464, Irwin 
(1935, 1937a) 470, Lévy (1937a) 475, Lüders 
(1934) 476, Molina (1942) 478, Poisson (1837) 
487, Przyborowski and Wilénski (1940) 487, 
Raikov (1936) 487, Ricker (1937) 488, 
Satterthwaite (1943) 490, “ Student ” (1907, 
1919) 493, Sukhatme (1937b, 1938a) 494, 
von Bortkiewicz (1898, 1910) 496, Weida 
(1935) 498, Whitaker (1914) 499. 

Poisson's theorem in probability, Bibl., Bochner 
(1936) 447, Bonferroni (1933) 447. See 
Central Limit Theorem. 
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Pólya distribution, Bibl., del Chiaro (1936) 456, 
S. Guldberg (1935) 466. See Negative 
Binomial. 

Polychoric correlations, Bibl., Pearson and Pearson 
(19225) 485, Ritchie-Scott (1918) 489. 

Polynomials, expansions in, Bibl., Cacciopolli 
(1932) 450, Davis (1933) 455. See Ortho- 
gonal Polynomials, Curve Fitting. 

Population of England and Wales, (Example 
22.7) 161-3, (Examples 22.8. 22.9) 164-7, 
(Table 29.2, Figure 29.2) 365. 

— analysis, Bibl.: Lotka (1938, 1939) 476, 
Pearl and Reed (1923) 482, Volterra (1936) 
496. 

Potato yields, (Example 21.11) 126. 

Power of a test, 272, 307-8. Bibl. : G. W. Brown 
(1939) 449, Dantzig (1940) 455, Eisenhart 
(1938) 459, MacStewart (1941) 476, Simaika 
(1941) 491, P. L. Hsu (19416) 469, P. C. 
Tang (1938) 494. See also Statistical 
Hypotheses. 

Powers of normal variates, Bibl., Haldane (1942a) 
467. 

Prediction, see Forecasting. 

Pretorius, 8. J., N.R., 173. 

Principal components, Bibl.: Girshik (1936) 485, 
Hotelling (1933, 19362) 469, Landahl (1938) 
474, Ledermann (1938) 475, Thurstone 
(1935) 495. 

Probabi. ty, Bibl.: Bartlett (19335) 445, Beck 
(1936) 446, Belardinelli (1934) 446, Borel 
(1939) 447, Broderick (1937) 449, Cantelli 
(1932, 19336) 450, Castelnuovo (1932) 451, 
Cramér (1937, 1938, 1939) 454, de Finotti 
(1933a, b, 1939a) 456, Doeblin (1938) 457, 
Doob (19345, 1941) 457, Eggenberger (1924) 
459, Erdélyi (1937) 459, Khintchine (19375) 
473, Kolmogoroff (1931, 1933a) 473, Lévy 
(1931a, 1931c, 1936a, 1937a, 1938a) 475, 
Lomnicki (1923) 476, Marchand (1937) 477, 
McKinsey (1939) 477, Moisseiev (1937) 478, 
Nagel (1936) 479, Reichenbach (1937) 488, 
Rice (1938) 488, Romanovsky (1931a) 489, 
Tornier (1929, 1930, 1936, 1937) 495, von 
Mises (1919a, b, 1928, 1931, 1936a, b, 1939c, 
1941) 497, Urban (1918) 496, Uspensky 
(1937) 496. 

Probits, Bibl. Bliss (1935, 1937) 447. ` 

Product, distribution of, Bibl., C. C. Craig (1936a) 
454. 

Product-moment correlation, see Correlation. 

- Proficiency test of recruits, (Example 24.7) 240-2. 

Proportionate frequencies, in variate-analysis, 228. 


Proportions, tests of, Bibl, Swaroop (1938) 494. 


Quadratic forms, see Independence of Quadratic 
Forms. 


Quality control, Bibl. ; Becker and others (1930) 
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446, Jennett and Welch (1939) 471, E. S. 
Pearson (1933a, 1934) 482, Shewhart (1931) 
491, Simon (1941) 491, Welch (19365) 498, 
Wilks (1941) 500, Wolfowitz (1943) 501. 
Quartiles, Bibl., Hojo (1931, 1933) 469. 
Quasi-Latin squares, Bibl., Yates (1937a) 502. 
Quasi-sufficiency, Bibl., Bartlett (1940) 445. 
Conditional Statistics. 


See 


Racial likeness, N.R., 358. 
478, K. Pearson (1926b) 485. 
variate Analysis. 

Rainfall in London, (Table 29.4, Figure 29.4) 367. 

Random component in time-series, 369 ; effect of 
trend-elimination on, 378- tests for, 
399. 

—— migration, Bibl., Brownlee (1911) 449. 

— occurrences, Bibl., Morant (1921) 478. 

——— order, tests of, 122-7. Bibl.: (runs, ote.) 
André (1884) 444, Besson (1920) 446, Borel 
(1933) 447, Denk (1936) 456, Fisher (19265) 

* 461, Gumbel (1943a) 466, Jones (1937c) 
472, Kaucky (1936) 472, Mood (1940) 478, 
von Bortkiewiez (1915a, 1917) 496, von 
Mises (1921) 497, Wolfowitz (1943) 501. 

— paths, Bibl, McCrea (1936) 477, Pólya 
(1938b) 487. 

—— samples, tables of, Bibl., 
others (1934) 476. 

—— sampling numbers, Bibl.: Kendall and 
Babington Smith (1939a) 472, K. R. Nair 
(1938a) 479, Yule (1938a) 503. 

——— sequence, Bibl.: Copeland (1928, 1929, 
1932, 1936, 1937) 453, Dórge (1934, 1936) 
458, Greville (1939) 466, Regan (1936, 
1938) 487, Rice (1939) 488, Swed and 
Eisenhart (1943) 494, Ville (1936a, b) 496, 
von Mises (1931, 1933) 497, Wald (19360, 
1937) 497, Young (1941) 502. 

—— variables, Bibl. : Cramér (1935a) 454, Cramér 
and others (1938) 454, de Finetti (1929) 
455, Eyraud (19385) 459, Lévy (1934, 
1935a, b, 1936c, 1939a, b) 475. See Proba- 
bility. 

Randomisation, and z-test, 209-13, 255-6; in 
design, 263-6. Bibl., E. S. Pearson (19375, 
1938) 483; and see Design. 

Randomised blocks, 213-14. Bibl.: Cornish 
(1940a) 453, McCarthy (1939) 477, Welch 
(1937) 498. See Blocks. 

Randomness, Bibl.: Borel (1937) 447, Dodd 
(1942) 457, Kendall (1941) 472, Kermack 
and McKendrick (1936, 1937) 473, Wiener 
(1938) 499. 

Range, test of, (Exercise 27.3) 327. Bibl. : Geary 
(1943) 464, Hartley (1942) 467, McKay and 
Pearson (1933) 477, Newman (1939) 480, 
Olds (1935) 481, E. S. Pearson (1926, 1932) 


Bibl., Morant (1939) 
See Multi- 
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482, Pearson and Haines (1935a) 482, 
Pearson and Hartley (1942, 1943) 483, 
Romanovsky (19335) 489, W. R. Thompson 
(1938) 494, Tippett (1925) 495. 

Rank correlation, 123, 441. Bibl. : Daniels (1944) 
455, Dantzig (1939) 455, Dubois (1939) 458, 
Hotelling and Pabst (1936c) 469, Kendall 
(1938b, 1942a) 472, Kendall and others 
(1939, 1939b) 472, Olds (19385) 481, K. 
Pearson (1914, 1921) 484, Pearson and 
Pearson (1931c, 1932) 486, “ Student ” 
(1921) 493, Wallis (1939) 498, Watkins 
(1933) 498, Woodbury (1940) 501. 

Ratio, distribution of, Bibl.: C. C. Craig (19290) 
453, Curtiss (1941) 454, Fieller (1932b) 460, 
Geary (1930) 464, Gordon (1941) 465, 
Hirschfeld (1937) 468, Kullback (1936a) 
474, Nicholson (1941) 481, van Uven (1932, 
1939) 496. 

Rectangular distribution, estimation of extremes, 
(Example 17.15) 28; intrinsic accuracy, 
(Example 17.11) 47 ; estimation by sample- 
centre, (Exercise 17.16) 48; confidence 
intervals for range, (Exercise 19.1) 83. 
Bibl.: O. L. Davies (1932) 455, Dunlap 
(1931) 458, Hall (192750) 467, Olds (1935) 
481, Rietz (1931a) 488. » 

Region of acceptance, 63, 76, 270. 

Regression, Gauss’ theorem on residuals, 60-1; 
generally, 141-74; analytical theory, 
141-5; fitting of curvilinear regressions, 
145-53; standard errors and tests of sig- 
nificance, 153-8; equal steps of variate, 
159-67; multiple curvilinear, 167; addi- 
tion of new variates, 167-72; in analysis 
of variance, 233-6 ; relation with Hotelling’s 
T, 336-7 ; in discriminatory analysis, 344-5. 

Bibl.: R. G. D. Allen (1939) 443, H. Vu 
Allen (1938) 443, Andersson (1932). 443, 
(1934) 444, Bartlett (1933a, 1938c) 445, F. 
Bernstein (1937) 446, Blakeman (1905) 447, 
S. S. Bose (1934a, b, 1938b) 448, Camp 
(19255). 450, Cochran (1938a) 452, Dodd 

(19376, c) 457, Dwyer (1937, 1941c) 458, 

Eisenhart (1939) 459, Ezekiel (19306) 460, 

Fisher (19225) 461, Galton (1886) 464, 

Jones (19370) 472, Koopmans (1937) 474, 

Mendershausen (1937a) 477, T. V. Moore 
(1937) 478, Neyman (1926) 480, K. Pearson 
(1896) 483, (1921, 1926a) 485, Quensel 
(1936) 487, Richards (1931) 488, Roman- 
ovsky (1926, 19315) 489, Slutzky (1914) 
491, K. Smith (1918) 492, Waugh (1942) 
498, Welch (1935) 498, Wicksell (19345) 
499, Yates (1939d) 502, Yule (1936) 503. 

coefficients, standard error of, 153-6 ; exact 

tests of, 156-8. 

Regular unbiassed critical regions, 318-19. 


P 517 

Rejection of observations, Bibl. : Irwin (1925b) 
470, Pearson and Chandra Sekhar (1930) 
483, Rider (1933) 488, W. R. Thompson 
(1935) 494. 

Relaxed oscillations, Bibl., Le Corbeiller (1933) 
475, van der Pol (1930) 496. 

Reliability coefficients, Bibl., Stouffer (1936b) 493. 

Replication, 255. Bibl.: Bartlett (1938a) 445, 
Cochran (1937b, 1938b, 1939a) 452, Yates 
(1933a, b) 500, (1936d) 501. See Design. 

Representative method of sampling, Bibl. : A. T. 
Craig (1939) 453, Jensen (1925) 471, Ney- 
man (19335, 1934) 480, Sukhatme (1935) 
493. 

Residual, in variance-analysis, 178, 185-7. 

Ricker, W. E., confidence intervals for Poisson 
distribution, 81. 

Riemann zeta-function, Bibl., Jessen and Wintner 
(1935) 471. 

Risk, theory of, Bibl., Cramér (1923) 454, Esscher 
(1932) 459. 

Robinson, G., N.R., 394, 437. 

Roots of equations, distribution of, Bibl., Girshik 
(1939, 1942) 465. 

Routine analysis, Bibl.: Neyman (19395, 19415) 
480, Przyborowski and Wilénski (1935b) 
487, “ Student " (1927) 493. 

Roy, S. N., distribution of canonical correlations, 
357 and N.R., 359. 

Runs, in time-series, see Random Order. 


Sampling distributions, moments of, see k-statisties, 
Moments. 

inquiries, see Design. 

—, miscellaneous, Bibl.: Bartky (1943) 445, 
Bartlett (19376) 445, Baten (1933b) 446, 
Bowley (1925) 448, Burks (1933) 450, Clap- 
ham (1931, 1936) 452, Cochran (1936, 
1939b, 1942b) 452, A. T. Craig (1933a, b) 
453, C. C. Craig (1931a) 453, Crum (1933) 
454, David (1938b) 455, Hey (1938) 468, 
Hilton (1924, 1928). 468, Kiser (1934) 473, 
McKay (1934) 477, Neyman (1933a, 1934, 
1938a) 480, Olds (1939, 1940) 481, Panse 
(1939) 482, E. S. Pearson (1933a, 1934) 
482, Pepper (1929) 486, Rhodes (1925) 488, 
Rider (19315) 488, Rietz (1937) 488, Shew- 
hart and Winters (1928) 491, ‘‘ Sophister ” 
(1928) 492. 

surveys, Bibl, A. N. Bose (1941) 447, C. 
Bose (1943) 447; and see Sampling, miscel- 
laneous. 

Sasuly, M., N.R., 394. 

Savur, S. R., N.R., 83. 

Scale, estimation of parameters of, 40-2 ; elimina- 

tion of parameters of, 79-80; Pitman’s 
tests of, 323-6. Bibl., Pitman (1939a, b) 
486. 


518 


Scale, reading, Bibl, Yule (1927b) 503. 

Scales of measurement, Bibl., Cochran (1943) 452. 

Seatterance, N.R., 358. 

Scedastic curve, 142. 

Scheffé, H., non-parametric tests, 322; 
304, 326. 

Schoolchildren, tests of, (Example 25.1) 258-9, 
(Example 28.4) 351-2. 

Schultz, H., N.R., 394. j 

Schuster, Sir Arthur, significance of periodogram, 
434; N.R., 437. 

Seasonal effect, in time-series, 369. Bibl.: Bow- 
ley and Smith (1924) 448, Carmichael (1931) 
451, Carver (1932) 451, Crum (1925) 454, 
Detroit Edison Co. (1930) 456, Donner 
(1928) 457, Falkner (1924) 460, Gressens 
(1925) 466, Mendershausen (19375) 478, 
Robb (1929, 1930) 489, Wald (1936a) 497, 
Wisniewski (1934) 501; Zrzavy (1933) 503. 

Second Limit Theorem, Bibl. Fréchet and Shohat 
(1931) 463. 

—— moment, see Variance. 

Seed in optical glass, (Example 23.6) 202-5. 

Seeds of wheat, germination of, (Example 23.7) 
207-9. 

Selective confidence intervals, 75-6. 

Semi-normal distribution, Bibl., Steffensen (1937) 
492. 

Seminvariants, see Cumulants, k-statistics. 

Sensitivity, of tests of significance, 256. 

Serial correlation, 402-4. See Correlogram. Bibl. : 
R. L. Anderson (1942) 443, Bartlett (1935c) 
445, Dixon (1944) 456, Kendall (1944a, b) 
473, Koopmans (1942) 474, Marples (1932) 
477, Schumann and Hofmeyer (1942) 490, 
Yule (1921) 502, (1926, 1927a) 503. 

Sheep population of England and Wales, (Table 
29.3, Figure 29.3) 366, (Example 29.5) 
385-6, (Example 30.5) 411, (Example 30.8) 
416-18, 

Sheppard’s corrections, see Grouping Corrections, 

Shortest confidence intervals, 71-5, 75-6. 

Significance tests, 96-140, 269-327. See Statistical 
Hypotheses. Bibl, Jeffreys (1938a) 471, 
Peiser (1943) 486. 

Silverstone, H., minimum variance, 61; (Exer- 
cises 18,1, 18.2) 61. 

Simaika, J., N.R., 304, 359. 

Similar regions, 283. Bibl., Feller (1938) 460. 

Simon, L. E., N.R., 61. 

Simple hypotheses, 269, 272-82, 317-26. 

Simultaneous estimation, of several parameters, 
34-44, 

—— fiducial distributions, Bibl., Bartlett (1939a) 
445. 

Sinusoidal limit, N.R., 394. Bibl. : Marsueguerra 

(1936) 477, Romanovsky (1931c, 1932a, 

19334) 489, Slutzky (19375) 491. 


N.R., 


INDEX 


Skewness, Bibl., Frisch (1934a) 464, Garner (1932) — 
464. 1 
Skulls (Egyptian), (Example 28.3) 345-8. 1 

Slutzky, E., N.R., 394, 399. 

Slutzky-Yule effect, 378-87, 399. Bibi., Slutzky 
(19375) 491, Yule (1921) 502. à 

Small numbers, law of, see Poisson Distribution. 

Smirnoff, N., w?-test, 109. 

Smith, H. Fairfield, N.R., 359. 

—, K., minimum-z?, 55 and N.R., 61. 

Smoothing, see Moving Averages, Trend. 

Soil, loss of weight in, (Example 22.3) 149-52, 
(Example 22.6) 158. 

Solomon, L., footnote, 51. 

Spearman, C., (Exercise 25.3) 267. 

Spearman’s factor theory, see Factor Analysis. 

— p, test of, 132. 

Speed tests in children, (Example 28.4 

Spelling ability in children (Example 25. 

Spencer’s formula in curve fitting, (Examp 
29.3) 376-7, 378-80, (Exercise 29.3 
(Example 30.2) 405. 

Spurious correlation, Bibl.: K. Pearson (18976) 
483, Spearman (1907, 1910) 492, Wicksell 
(1921) 499. 

Square of a variate, Bibl., Haldane (1941) 407. 

Squariance, footnote 178. E 

Stabilising of variance, 207. 

Stability of series, see Lexis Theory. 

Stable laws of probability, Bibl.: Bochner (1937) 
447, Feldheim (1937a) 460, Khintchine and 3 
Lévy (1936) 473, Khintchine (1938) 473. 

Standard deviation, estimation of, (I nple 17.5) 
6-7, (Example 17.6) 11, 52. See Variance. 

— errors, in testing . signi nce, 97-8; of 
regression coefficients, —6. Bibl. : Derk- 
son (1939) 456, Edgeworth (1908, 1909) 
459, Eels (1929) 459, Hendricks (1934) 468, 
Isserlis (1915, 1916) 470, Miller (1934) 478, 
K. Pearson (1903, 1913, 1920) 484, (1924d) 

* 485, K. Pearson and Lee (1908) 484, K. 
Pearson and Filon (1898) 483. 

—— Latin squares, 259. : 

Stationary time-series, 396. Bibl. : Khintchine 
(1932, 1933, 1934) 473, Slutzky (1934) 491, 
Wold (1938a, 1939) 501. See Time-series, l 
Correlogram. 

Statistical hypotheses, definition, 269; errors ofi 
first and second kind, 270-2; power 
function, 272; simple hypotheses, 212-55 
best critical regions, 277-80 ; relation with 
sufficient estimators, 281-2; composite 
hypotheses, 282-3; similar regions, 283-7 ; 
of several degrees of freedom, 287 ; linear 
hypotheses, 292-5; likelihood criteria, 

w 295; k samples, 295-302; bias, 307-26 ; 
regions of Type A, 309-14, of Type Ay 
314-16, of Type B, 316-17, of Type Oy. 
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317-22 ; limiting properties, 322 ; Pitman's 
tests, 323-6. 

Bibl.: G. W. Brown (1940) 449, Chandra 
Sekhar and Francis (1941) 451, Daly (1940) 
454, Dantzig (1940) 455, Gumbel (1942) 
466, R. W. Jackson (1936) 471, Kolod- 
zieczyk (1933, 1935) 474, Neyman (19356, 
19385) 480, (1942) 481, Neyman and Pear- 
son (1928, 193la, 1933a,c, 1936a, 1938) 
480, E. S. Pearson (1941, 1942a) 483, 
Pitman (1939b) 486, Rietz (1938) 488, 
Seheffé (1942a, 1943) 490, Wald. (1939a) 
497, (1941a) 498, Wilks (1935c, 1938a) 499, 
Wolfowitz (1942) 501. 

Statistical Review of England and Wales, data from, 
(Example 21.8) 120, (Example 21.9) 121. 

Stevens, W. L., test of significance in periodogram, 
434; N.R., 216. 

Stieltjes integrals, Bibl., Shohat (1930) 491. 

Stochastic convergence, 440. See Convergence in 
Probability. 

—— dependence, see Independence. 

processes, Bibl., Doob (1934a, 1937, 1938) 

457, Feller (1936a) 460. See Probability. 

Stock forecasting, Bibl., Cowles (1933) 453, Cowles 
and Jones (1937) 453. 

Stock, J. S., N.R., 266. 

Stratified sampling, 249-52. Bibl. : P. H. Ander- 
son (1942) 443, Baker (1930c) 444, G. M. 
Brown (1933) 449, Frankel and Stock (1939) 
463, McKay (1934) 477, Mood (1943) 478. 
See also Sampling, miscellaneous, Repre- 
sentative Method. 

“ Student " (W. S. Gosset), see Gosset. 

Studentisation, 79-81, 134. Bibl., Hartley (1938, 
1944) 467, Newman (1939) 480. 

* Student's" distribution, confidence intervals 
based on, 79-80; fiducial inference based 
on, 88; properties of, 100-2; in testing 
mean, 98-100 ; in non-normal case, 102-4 ; 
other uses, 104; in testing two means, 
109-10, 113-14; in testing Spearman’s p, 
124; in Pitman’s tests, 131, 132 ; in testing 
regressions, 156, 158, 172; in analysis of 
covariance, 244; (Example 26.9) 291. 

Bibl.: Bartlett (1935a) 445, C. C. Craig 
(1941a) 454, Daniels (19384) 454, Fisher 
(1926a) 461, Geary (19360) 464, Hendricks 
(1936) 468, P. L. Hsu (1938a) 469, N. L. 
Johnson and Welch (1940a) 471, Kerrich 
(1937) 473, Kolodzicezyk (1933) 474, Lader- 
mann (1939) 474, McKay and others (1932) 
477, Merrington (1942) 478, A. N. K. Nair 
(1942) 479, Porlo (1933) 486, Rider (1929) 
488, Rietz (1939) 488, Steffensen (1936) 492, 
“ Student ” (1908a, 1931a) 493, Treloar and 
Wilder (1934) 495. 

—— hypothesis, 285-7. 


BibL, Neyman and 
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Tokarska (19365) 480, Przyborowski and 
Wilénski (1935a) 487. ’ 

Stumpff, K., N.R., 437. 

Sufficient estimators, 7-12; given by maximum 
likelihood, 19; general form possessing, 
24—5; distribution of, 25; when range 
depends on parameter, 27-8; for several 
parameters, 39-40; giving minimum- 
variance estimators, 52; relation with 
confidence intervals, 74-5, 79; relation 
with U.M.P. tests, 281-2, with U.M.P.U. 
tests, 310. 

Bibl. : Bartlett (1936b, 1937c, 1940) 445, 
Darmois (1935) 455, Koopman (1936) 474, 
Neyman (1935a) 480, Neyman and Pearson 
(19362) 480, Pitman (1936) 486, Welch 
(1939a) 498. 

Sukhatme, P. V., tables for Behrens' test, 92, 111; 
(Exercise 26.8) 305-6 ; sampling moments, 
440. N.R., 94, 266, 304. 

Sum, distribution of, see Means. 

Summation .convention, 329. 

Sunspots, Bibl., Schuster (1906) 490, Yule (1927) 
503. 

Symmetric functions, Bibl, O"Toole (1931, 1932) 
481. See Moments, k-statistics. 


T-distribution, see Hotelling’s T. 

Tabular differences, Bibl., Ladermann and Lowan 
(1939) 474. 

Tanburn, E., N.R., 137. 

Tang, P. C., linear hypotheses, 301; N.R., 303. 

Tehebycheff, P. L., (Exercise 22.4) 173 ; N.R., 172. 

Tehebycheff-Hermite polynomials, Bibl. : Doetsch 
(1934) 457, Erdélyi (1938) 459, Feldheim 
(19375) 460. See Gram-Charlier Series, 
Orthogonal Polynomials. 

Tchebycheff’s inequality, Bibl.: “Berge (1938) 
446, Bernstein (1937) 446, Camp (1922) 450, 
C. C. Craig (1933) 454, K. Pearson (1919) 
485, C. D. Smith (1930) 491. 

Tea-drinking, Bibl., Mahalanobis (1943) 476. 

Telephone service, Bibl., Newland and Neal (1939) 
479, Palm (1937) 482. 

Terminals of frequency-distribution, confidence 
intervals for, 83. 

Test construction, BibL, Cureton and Dunlap 
(1938) 454. 

Tests of significance, see Significance, Statistical 
Hypotheses. > 


` Tetrachoric functions, Bibl. : J. Henderson (1922) 


468, K. Pearson (1912a, 1913a, b) 484, K. 
Pearson and Heron (1913c) 484, Newbold 
(1925) 479, Pearson and Pearson (19225) 
485. 

Tetrad difference, (Exercise 28.10) 362. Bibl., 
Hotelling (19365) 469, Wilks (1932d) 499. 
See Factor Analysis. 
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Third moment, distribution of, Bibl., Pepper 
(1932) 486. 

Thompson, C., on 4-tests, 299; N.R., 303. 

Thompson, W. R., (Exercise 19.5) 84; N.R., 83. 

Thomson, G., (Example 25.1) 258-9. = 

Ties in ranking, 127, 441. 

Time-series, 363-439 ; examples of, 363-9 ; trend, 
371-8; effect of trend elimination, 378-87 ; 
variate difference method, 387-94 ; oscilla- 
tions, 397-9; tests for randomness, 399; 
types of oscillatory series, 395—402 ; serial 
correlations, 402-4; correlogram, 404-13 ; 
autoregressive schemes, 414-21; auto- 

- correlation function, 421-3; periodogram 
analysis, 423-33 ; significance of a periodo- 
gram, 433-5; lag correlation, 435-7. 

Bibl. : Bartels (1935) 445, Darmois (1929) 
455, Davis (1941) 455, Jones (19378, c) 472, 
Kendall (1944a, b) 473, Koopmans (1937, 
1940, 1941) 474, Macaulay (1931) 476, 
Roos (1934, 1936) 489, von Szeliski (1929) 
497, Wallis and Moore (1941) 498, Wold 
(19384) 501, Zaycoff (1936, 1937) 503. 

See also Correlogram, Harmonic Analysis, 
Periodicity. 

Tintner, G., variato- differente method, 393. N.R., 
394. 

Tokarska, B., N.R., 303. 

Tolerance Dus see Quality Control. 

Trade cycles, see Periodicity. 

Traffic signals, Bibl, Garwood (1940) 464, 

Transformation of distributions, Bibl.: Baker 
(1930a, 1934) 444, Beall (1942) 446, Bliss 
(1938) 447, Curtiss (1943) 454, Frankel and 
Hotelling (1938) 463, Landahl (1938) 474, 
Rietz (19310) 488, Tricomi (1938) 495, 
Yasukawa (1925) 501, Zoch (1934) 503. 

Transvariation, Bibl., Castellano (1934, 1937) 451. 

Travers, R. M. W., N.R., 359. - 

Trend, 369-70, 371-87. Bibl. : Lorenz (1931, 
1935) 476, Macaulay (1931) 476, Rhodes 
(1921) 488, Sasuly (1934) 490, Schumann 
(1938) 490, Sipos (1930) 491, Working and 
Hotelling (1929) 501. - 

Trough, in time-series, 124. 

Truncated normal distribution, BibL, 
(1938) 473, Stevens (1937a) 493.. 

Turner, H. H., N.R., 437. 

Turning-point, in time-series, 124. 

Two samples, Bibl. : Behrens (1929) 446, Dixon 


` Keyfitz 


(1940) 456, P. L. Hsu (1938a) 469, Lengyel* 


(1939) 475, Mathisen (1943) 477, E. S. 
Pearson (1929) 482, Pearson and Neyman 
(1930) 482, K. Pearson (1911a) 484, (19314) 
485, Peek (1937) 486, Rhodes (1924, 1925) 
488, Romanovsky (1928) 489, Starkey 
(1938) 492, Sukhatme (1935, 1936b) 493, 
Swaroop (1938) 494, W. R. Thompson 
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(1933) 494, Wald and Wolfowitz (1940c) 
498, Welch (1938a) 498, Yates (1939/) 
501. 

Type A, B, C, in statistical tests, 309-27. 

Type I distribution, (Exercise 17.17) 49. 

— II distribution, Bibl. Carlson (1932) 451. 

—— IH distribution, estimation of parameters 
in, (Example 17.8) 20-1, (Example 17.13) 
26, (Example 17.19) 39, (Example 18.3) 
53-4; sufficiency, (Example 17.21) 40; 
centre of location of, epo 7.28 
confidence intervals for 
19.5) 74-5; fiducial c ri 
meter, 87. Bibl. : C. C. Craig (19292) 4 53, 
Kullback (19362) 474, Olshen (1938) 481, 
Salvosa (1930) 490, Wicksell (1933) 499. 

— IV distribution, centre of location of, (Ex 
cise 17.15) 48; intrinsic accuracy of, 
(Exercise 17.19) 49. 


Unbiassed estimators, 3-4; confidence intervals, 
76; tests, 309-27. 


Unequal subclasses, in variance-analysis, 220-4. 
Bibl.: Brandt (1933) 449, Wald (19400) 
7497, (1941d) 498, Wilks (1938e) 500, Yates 


(1934a) 501. 
Uniformly most powerful tests, 276 ; 
tests, 309, N.R., 359. 
U-shaped distribution, Bibl., Holzinger and Church 
(1929) 469. 


unbiassed 


Variability, measures of, Bibl. : Castellano (1935) 
451, de Vergottini (1936) 456, Galvani 
(1931) 464, Gini (1912, 1930) 465, March 
(1926) 477, Pietra (1932a) 486, Vinei (1920) 
496. 

Variance, analysis of, see Analysis of anion 

+—, distribution and tests of, Bibl.: Baker 
(19031, 1932, 1935, 1940) 444, Church 
(1925, 1926) 452, A. T. Craig (1932, 1938) 
453, Dunlap (1931).458, Fertig and Proehl 
(1937) 460, Greenwood and Greville (1939) 
466, Kondo (1930) 474, Le Roux (1931) 
475, K. Pearson (1931d) 486, Quensel 
(1938) 487, Rhodes (1927) 488, Rietz (1931a) 
488, Romanovsky (1925a) 489, Truksa 
(1940) 495, von Bortkiewiez (1922) 497, 
Yasukawa (1925) 501. See also Fisher’s 
Distribution, k samples. 

—, estimation of, BibL, O. L. Davies and 
Pearson (1934) 455, P. L. Hsu (19380) 469. 

— ratio, Bibl. : S. S. Bose (1935) 448, Cochran 
(1941) 452, Finney (1938, 1941a) 460, 
Morgan (1939) 478, U. S. Nair (1941a, b) 
479, Scheffé (19425) 490. See also Fisher's 
Distribution. 

— —, test of, in normal samples, 104 ; difference 
of two variances, 115, (Example 26.8) 289. 
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Variate-difference method, 387-94. Bibl. : Ander- 
son (1914, 1928, 1926, 1929) 443, Cave- 
Browne-Cave (1904) 451, Cave and Pearson 
(1914) 451, Haavelmo (1941) 467, K. 
Pearson and Elderton (1923a) 485, Robb 
(1929) 489, ** Student ” (1914) 493, Tintner 


(1935, 1940, 1941) 495, Zaycoff (1936, 1937) . 


503. 

Variate TERRI ANT in analysis of variance, 
206-9. See Transformation. 

Variation, eoeffieient of, Bibl.: Hendricks and 
Robey (1936) 468, McKay (1931) 477, 
MeKay. and. others (1932) 477. 

Variety trials, Bibl, Yates (1936d,-1937a) 502. 

Vector correlation, alienation coefficients, (Exer- 
cises 28.8, 28.9, 28.10) 361-2. 

——— representation of a` sample, Bibl., 
(19345). 445; 

von Mises, R., o?-test, 108.; 
123. 


Bartlett 


Trregular Kollektiv, 


Wald, A., most-selective ¢onfidénce intervals. 
82-3; limiting properties of tests,» 322, 
N.R., 83, 304, 326. 

Walker, Sir Gilbert, time-series, 420 ; significance 
of a periodogram, 434. 

Wallace, N., N.R., 359. 

Wallis, W. US phases i in time-series, 126, N.R., 136. 

Water-content in samples, (Example 23.3) 190-4, 

^ (Example 23.4) 196-8. 

Weather, effect on moths, (Example 22.10) 171-2. 

Welch, B. L., difference of two means, 
(Ezampis 21.6) 113; (Exercise 21.7) 139 ; 
Latin squares, 261; footnote 295. .N.R., 
45, 83, 216, 304, 359. 

Wheat-priee index (of Sir "William Beveridge), 
(Table 30.1) 396, (Example 30.4, Table 30.6, 
Figure 30.5) 409-10; (Table 30.9 and 
Figure 30.9) 425-30; (Example 30.5) 
431-2; (Example 30.10) 435. 

Wheat prices, and horse bai EM 30.10) 
436. 
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Whittaker, Sir Edmund, periodogram (Exercise 
30.10) 439, Calculus of Observations, N.R., 
394, 437. 

Wicksell S. D., theorem on regressions, 143; 
(Example 22.2) 144; (Exercises 22.1, 22.2, 
22.3) 173. N:R., 172, 173. 

Wiener, N., autocorrelation function, 422. 

Wilks, S. S., shortest confidence intervals, 82; 
A-tests, 299; Hotelling’s T, 337-8; dis- 
tribution of means, 341, 358; (Exercise 
19.1) 83, (Exercise. 19.4) 84, (Exercises 
28.4, 28.5) 360. N.R., 83, 245, 303, 304, 
359. 


' Wilsdon, B. H., N.R., 245. 


Wilson-Hilferty transformation of %?, 118. 

Wishart, J., (Exercise 24. 3) 246, (Exercises 28.1, 
28.2) 359-60. NR., 245, 359. 

Wishart’s distribution, 330-5, 337-8, (Exercise 
28.3) 360. Bibl.: P. L. Hsu (1939a) 469, 
Ingham (1933) 470, Wishart (1928) 500, 
Wishart and Bartlett (1933c) 500. 

Wold, H., «c?-test, 108; (Exercise 25.3) 267; 
time-series, 418; Carleman criterion, 440. 
N-R., 266, 437. 

Wolfowitz, J., confidence intervals for terminals 
of a distribution, 83. N.R., 304. 

Woodbury, M.,- tied ranks, 441. 

Wool thread, Weights of, (Example 23.2) 183-5. 


Yates, F., tables of 4, 102 ; (Example 23 5) 200-2; 
z-distribution, 206 ; (Example 23.8) 214; 

- (Example 24.1) 221-5; (Example 24.5) 

- . 230-3; design of experiments, 263. N.R., 


" 94, 216, "245; 266. 


Yule, G, U., autoregressive series, 418; (Exercises 
4 303 and 30.9) 439. N.R., 394, 437. 


Zaycoft, R Ry  variato-difforenco method, 393. N.R., 
394. 


‘z-distribution, see Fisher's Distribution. 


